[ https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated SOLR-1423: -------------------------------- Attachment: SOLR-1423-fix-empty-tokens.patch Attached a new patch with the empty token fix. It has an additional test for the offsets, if group!=-1. It also is more optimized, as it uses setTermBuffer( string, offset, len) to copy the chars into the termbuffer, which is faster than allocating a new string with substring(). > Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & > others > -------------------------------------------------------------------------------- > > Key: SOLR-1423 > URL: https://issues.apache.org/jira/browse/SOLR-1423 > Project: Solr > Issue Type: Task > Components: Analysis > Affects Versions: 1.4 > Reporter: Uwe Schindler > Assignee: Koji Sekiguchi > Fix For: 1.4 > > Attachments: SOLR-1423-FieldType.patch, > SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, > SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, > SOLR-1423.patch > > > Because of some backwards compatibility problems (LUCENE-1906) we changed the > CharStream/CharFilter API a little bit. Tokenizer now only has a input field > of type java.io.Reader (as before the CharStream code). To correct offsets, > it is now needed to call the Tokenizer.correctOffset(int) method, which > delegates to the CharStream (if input is subclass of CharStream), else > returns an uncorrected offset. Normally it is enough to change all occurences > of input.correctOffset() to this.correctOffset() in Tokenizers. It should > also be checked, if custom Tokenizers in Solr do correct their offsets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.