[
https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495202
]
Hoss Man commented on SOLR-234:
-------------------------------
> I'd think that updating the offsets is almost always the right thing to do
> (and should be the default?), given that spaces will
> almost always come from the field value itself.
i don't follow your reasoning ... the offsets are suppose to denote where in
the original text the Token came from ... a Filter can't make any assumptions
about source of the tokens except the token itself, so i don't' see why a
Filter would by default assume it can muck with the offsets.
In Ryan's use case he may want his highlighter-esque code to be able to know
how many characters were trimmed off of each end -- and i buy that it makes
sense for TrimFilter to have an option to relay that info by modifying the
offset -- but joe random user should be able to expect that by default the
offsets of the Tokens his tokenizer produces won't be modified ... i would
personally think it's a bug to get the behavior ryan describes out of a
highlighter if i knew that my tokenizer was only spliting on punctuation.
> TrimFilter should update the start and end offsets
> --------------------------------------------------
>
> Key: SOLR-234
> URL: https://issues.apache.org/jira/browse/SOLR-234
> Project: Solr
> Issue Type: Improvement
> Reporter: Ryan McKinley
> Priority: Minor
> Attachments: SOLR-234-TrimFilterOffsets.patch,
> SOLR-234-TrimFilterOffsets.patch
>
>
> As implemented, the TrimFilter only trims the text. It does not update the
> the startOffset and endOffset
> see:
> http://www.nabble.com/TrimFilter----t.startOffset%28%29%2C-t.endOffset%28%29-tf3728875.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.