[
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755564#action_12755564
]
Uwe Schindler commented on SOLR-1423:
-------------------------------------
Then you could use SOLR-1423-fix-empty-tokens.patch it should work. The
comparison with String.split() in one of the tests was commented out, because
it does not work with the tokenizer (as empty tokens are not returned).
I only wanted to check, that the offsets are calculated correctly. The second
tests does this, but I want to be sure, that they are correct for both group=-1
and group>=0.
> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream &
> others
> --------------------------------------------------------------------------------
>
> Key: SOLR-1423
> URL: https://issues.apache.org/jira/browse/SOLR-1423
> Project: Solr
> Issue Type: Task
> Components: Analysis
> Affects Versions: 1.4
> Reporter: Uwe Schindler
> Assignee: Koji Sekiguchi
> Fix For: 1.4
>
> Attachments: SOLR-1423-FieldType.patch,
> SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch,
> SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch
>
>
> Because of some backwards compatibility problems (LUCENE-1906) we changed the
> CharStream/CharFilter API a little bit. Tokenizer now only has a input field
> of type java.io.Reader (as before the CharStream code). To correct offsets,
> it is now needed to call the Tokenizer.correctOffset(int) method, which
> delegates to the CharStream (if input is subclass of CharStream), else
> returns an uncorrected offset. Normally it is enough to change all occurences
> of input.correctOffset() to this.correctOffset() in Tokenizers. It should
> also be checked, if custom Tokenizers in Solr do correct their offsets.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.