[
https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854972#action_12854972
]
Joe Calderon commented on SOLR-1869:
------------------------------------
"at the same position and Term text as the previous token " is ambiguous, i
assumed position to mean same start and end offsets, hence i assumed there was
a bug.
i changed the filter to use CharArraySet, there was already a call to
previous.clear() in reset(). Since the filter name is different i attached its
accompanying factory.
this all started because the highlighter was highlighting a term at the same
offsets twice, for example if i had a word with a synonym [ex-con,0,6] and
[excon,0,5] then ran it through edgengram filter i would end up with two
tokens [ex, 0,2] with different position increments, the highlighted snippet
was then "<em>ex</em><em>ex</em>-con", i posted this on the mailing list and
RemoveDuplicatesTokenFilter was suggested.
> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>
> Key: SOLR-1869
> URL: https://issues.apache.org/jira/browse/SOLR-1869
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Joe Calderon
> Priority: Minor
> Attachments: RemoveDupOffsetTokenFilter.java,
> RemoveDupOffsetTokenFilterFactory.java, SOLR-1869.patch
>
>
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical
> terms with the same offset positions, instead it looks like it removes
> duplicates based on position increment which wont work when using it after
> something like the edgengram filter. when i posted this to the mailing list
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables
> in constructor
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.