[jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Uwe Schindler (JIRA) Tue, 15 Sep 2009 10:06:54 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated SOLR-1423:
--------------------------------

    Attachment: SOLR-1423-fix-empty-tokens.patch

Attached a new patch with the empty token fix.

It has an additional test for the offsets, if group!=-1. It also is more 
optimized, as it uses setTermBuffer( string, offset, len) to copy the chars 
into the termbuffer, which is faster than allocating a new string with 
substring().

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & 
> others
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-1423
>                 URL: https://issues.apache.org/jira/browse/SOLR-1423
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>
>         Attachments: SOLR-1423-FieldType.patch, 
> SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, 
> SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, 
> SOLR-1423.patch
>
>
> Because of some backwards compatibility problems (LUCENE-1906) we changed the 
> CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
> of type java.io.Reader (as before the CharStream code). To correct offsets, 
> it is now needed to call the Tokenizer.correctOffset(int) method, which 
> delegates to the CharStream (if input is subclass of CharStream), else 
> returns an uncorrected offset. Normally it is enough to change all occurences 
> of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
> also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Reply via email to