[ 
https://issues.apache.org/jira/browse/SOLR-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495147
 ] 

Yonik Seeley commented on SOLR-234:
-----------------------------------

Updating the offsets does seem like the right thing to do.

I imagine using toCharArray() will be slower than using charAt() given that it 
will allocate a new array, and the number of charAt() calls will be low in the 
average case because there will only be a small amount of whitespace.

Isn't it annoying that Java never seems to let you do things as efficiently as 
the class lib itself...

Another issue here is that the position increment isn't maintained.
And let another future issue is that any payloads aren't maintained (that's in 
a newer version of Lucene).
I'll bring up the latter issue on the lucene list since I think it's a bit of a 
design flaw.

> TrimFilter should update the start and end offsets
> --------------------------------------------------
>
>                 Key: SOLR-234
>                 URL: https://issues.apache.org/jira/browse/SOLR-234
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: SOLR-234-TrimFilterOffsets.patch
>
>
> As implemented, the TrimFilter only trims the text.  It does not update the 
> the startOffset and endOffset
> see:
> http://www.nabble.com/TrimFilter----t.startOffset%28%29%2C-t.endOffset%28%29-tf3728875.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to