[ 
https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495202
 ] 

Hoss Man commented on SOLR-234:
-------------------------------

> I'd think that updating the offsets is almost always the right thing to do 
> (and should be the default?), given that spaces will 
> almost always come from the field value itself.

i don't follow your reasoning ... the offsets are suppose to denote where in 
the original text the Token came from ... a Filter can't make any assumptions 
about source of the tokens except the token itself, so i don't' see why a 
Filter would by default assume it can muck with the offsets.

In Ryan's use case he may want his highlighter-esque code to be able to know 
how many characters were trimmed off of each end -- and i buy that it makes 
sense for TrimFilter to have an option to relay that info by modifying the 
offset -- but joe random user should be able to expect that by default the 
offsets of the Tokens his tokenizer produces won't be modified ... i would 
personally think it's a bug to get the behavior ryan describes out of a 
highlighter if i knew that my tokenizer was only spliting on punctuation.

> TrimFilter should update the start and end offsets
> --------------------------------------------------
>
>                 Key: SOLR-234
>                 URL: https://issues.apache.org/jira/browse/SOLR-234
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: SOLR-234-TrimFilterOffsets.patch, 
> SOLR-234-TrimFilterOffsets.patch
>
>
> As implemented, the TrimFilter only trims the text.  It does not update the 
> the startOffset and endOffset
> see:
> http://www.nabble.com/TrimFilter----t.startOffset%28%29%2C-t.endOffset%28%29-tf3728875.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to