Brandon, Looks like SOLR-2509 <https://issues.apache.org/jira/browse/SOLR-2509> fixed the problem - that's where OffsetAttribute was added (as you noted).
I ran my test method on branches/lucene_solr_3_5/, and I got the same failure there as you did, so I can confirm that Solr 3.5 has this bug, but that it will be fixed in Solr 3.6. Steve > -----Original Message----- > From: Brandon Fish [mailto:brandon.j.f...@gmail.com] > Sent: Thursday, December 15, 2011 6:16 PM > To: solr-user@lucene.apache.org > Subject: Re: Is there an issue with hypens in SpellChecker with > StandardTokenizer? > > Yes the branch_3x works for me as well. The addition of the > OffsetAttribute > probably corrected this issue. I will either switch to > WhitespaceAnalyzer, > patch my distribution or wait for 3.6 to resolve this. > > Thanks. > > On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish > <brandon.j.f...@gmail.com>wrote: > > > Hi Steve, > > > > I was using branch 3.5. I will try this on tip of branch_3x too. > > > > Thanks. > > > > > > On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe <sar...@syr.edu> wrote: > > > >> Hi Brandon, > >> > >> When I add the following to SpellingQueryConverterTest.java on the tip > of > >> branch_3x (will be released as Solr 3.6), the test succeeds: > >> > >> @Test > >> public void testStandardAnalyzerWithHyphen() { > >> SpellingQueryConverter converter = new SpellingQueryConverter(); > >> converter.init(new NamedList()); > >> converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); > >> String original = "another-test"; > >> Collection<Token> tokens = converter.convert(original); > >> assertTrue("tokens is null and it shouldn't be", tokens != null); > >> assertEquals("tokens Size: " + tokens.size() + " is not 2", 2, > >> tokens.size()); > >> assertTrue("Token offsets do not match", isOffsetCorrect(original, > >> tokens)); > >> } > >> > >> What version of Solr/Lucene are you using? > >> > >> Steve > >> > >> > -----Original Message----- > >> > From: Brandon Fish [mailto:brandon.j.f...@gmail.com] > >> > Sent: Thursday, December 15, 2011 3:08 PM > >> > To: solr-user@lucene.apache.org > >> > Subject: Is there an issue with hypens in SpellChecker with > >> > StandardTokenizer? > >> > > >> > I am getting an error using the SpellChecker component with the query > >> > "another-test" > >> > java.lang.StringIndexOutOfBoundsException: String index out of range: > -7 > >> > > >> > This appears to be related to this > >> > issue<https://issues.apache.org/jira/browse/SOLR-1630> which > >> > has been marked as fixed. My configuration and test case that follows > >> > appear to reproduce the error I am seeing. Both "another" and "test" > get > >> > changed into tokens with start and end offsets of 0 and 12. > >> > <analyzer> > >> > <tokenizer class="solr.StandardTokenizerFactory"/> > >> > <filter class="solr.StopFilterFactory" ignoreCase="true" > >> > words="stopwords.txt"/> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > > >> > &spellcheck=true&spellcheck.collate=true > >> > > >> > Is this an issue with my configuration/test or is there an issue with > >> the > >> > SpellingQueryConverter? Is there a recommended work around such as > the > >> > WhitespaceTokenizer as mention in the issue comments? > >> > > >> > Thank you for your help. > >> > > >> > package org.apache.solr.spelling; > >> > import static org.junit.Assert.assertTrue; > >> > import java.util.Collection; > >> > import org.apache.lucene.analysis.Token; > >> > import org.apache.lucene.analysis.standard.StandardAnalyzer; > >> > import org.apache.lucene.util.Version; > >> > import org.apache.solr.common.util.NamedList; > >> > import org.junit.Test; > >> > public class SimpleQueryConverterTest { > >> > @Test > >> > public void testSimpleQueryConversion() { > >> > SpellingQueryConverter converter = new SpellingQueryConverter(); > >> > converter.init(new NamedList()); > >> > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); > >> > String original = "another-test"; > >> > Collection<Token> tokens = converter.convert(original); > >> > assertTrue("Token offsets do not match", > >> > isOffsetCorrect(original, tokens)); > >> > } > >> > private boolean isOffsetCorrect(String s, Collection<Token> tokens) { > >> > for (Token token : tokens) { > >> > int start = token.startOffset(); > >> > int end = token.endOffset(); > >> > if (!s.substring(start, end).equals(token.toString())) > >> > return false; > >> > } > >> > return true; > >> > } > >> > } > >> > > > >