[
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated SOLR-1423:
--------------------------------
Attachment: SOLR-1423-with-empty-tokens.patch
Some refactoring (I moved the PatternTokenizer to its own class, like
PatternReplaceFilter). This patch is functionally identical to current trunk,
but more effective and uses new TokenStream API and implements end() (which
sets the offset to the end of the string).
I will soon post a patch, which removes empty tokens.
> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream &
> others
> --------------------------------------------------------------------------------
>
> Key: SOLR-1423
> URL: https://issues.apache.org/jira/browse/SOLR-1423
> Project: Solr
> Issue Type: Task
> Components: Analysis
> Affects Versions: 1.4
> Reporter: Uwe Schindler
> Assignee: Koji Sekiguchi
> Fix For: 1.4
>
> Attachments: SOLR-1423-FieldType.patch,
> SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch,
> SOLR-1423.patch
>
>
> Because of some backwards compatibility problems (LUCENE-1906) we changed the
> CharStream/CharFilter API a little bit. Tokenizer now only has a input field
> of type java.io.Reader (as before the CharStream code). To correct offsets,
> it is now needed to call the Tokenizer.correctOffset(int) method, which
> delegates to the CharStream (if input is subclass of CharStream), else
> returns an uncorrected offset. Normally it is enough to change all occurences
> of input.correctOffset() to this.correctOffset() in Tokenizers. It should
> also be checked, if custom Tokenizers in Solr do correct their offsets.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.