[jira] Commented: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Uwe Schindler (JIRA) Tue, 15 Sep 2009 00:58:25 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755409#action_12755409
 ]


Uwe Schindler commented on SOLR-1423:
-------------------------------------

bq. I think the empty tokens is a bug and should be omitted in this patch.

The Javadocs say, that it works like String.split() which return empty tokens, 
but strips empty tokens at the end of the string. This functionality is 
provided by Solr before and with this patch.
The code would get simplier, if the Tokenizer would generally strip empty 
tokens, but it is a backwards break. I would tend to just commit and then open 
another issue.

bq. Very nice! Can you open a separate ticket?

Will open one about Lucene's BaseTokenStreamTestCase 

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & 
> others
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-1423
>                 URL: https://issues.apache.org/jira/browse/SOLR-1423
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>
>         Attachments: SOLR-1423-FieldType.patch, SOLR-1423.patch, 
> SOLR-1423.patch, SOLR-1423.patch
>
>
> Because of some backwards compatibility problems (LUCENE-1906) we changed the 
> CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
> of type java.io.Reader (as before the CharStream code). To correct offsets, 
> it is now needed to call the Tokenizer.correctOffset(int) method, which 
> delegates to the CharStream (if input is subclass of CharStream), else 
> returns an uncorrected offset. Normally it is enough to change all occurences 
> of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
> also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others

Reply via email to