Hi Brandon, When I add the following to SpellingQueryConverterTest.java on the tip of branch_3x (will be released as Solr 3.6), the test succeeds:
@Test public void testStandardAnalyzerWithHyphen() { SpellingQueryConverter converter = new SpellingQueryConverter(); converter.init(new NamedList()); converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); String original = "another-test"; Collection<Token> tokens = converter.convert(original); assertTrue("tokens is null and it shouldn't be", tokens != null); assertEquals("tokens Size: " + tokens.size() + " is not 2", 2, tokens.size()); assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens)); } What version of Solr/Lucene are you using? Steve > -----Original Message----- > From: Brandon Fish [mailto:brandon.j.f...@gmail.com] > Sent: Thursday, December 15, 2011 3:08 PM > To: solr-user@lucene.apache.org > Subject: Is there an issue with hypens in SpellChecker with > StandardTokenizer? > > I am getting an error using the SpellChecker component with the query > "another-test" > java.lang.StringIndexOutOfBoundsException: String index out of range: -7 > > This appears to be related to this > issue<https://issues.apache.org/jira/browse/SOLR-1630> which > has been marked as fixed. My configuration and test case that follows > appear to reproduce the error I am seeing. Both "another" and "test" get > changed into tokens with start and end offsets of 0 and 12. > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > > &spellcheck=true&spellcheck.collate=true > > Is this an issue with my configuration/test or is there an issue with the > SpellingQueryConverter? Is there a recommended work around such as the > WhitespaceTokenizer as mention in the issue comments? > > Thank you for your help. > > package org.apache.solr.spelling; > import static org.junit.Assert.assertTrue; > import java.util.Collection; > import org.apache.lucene.analysis.Token; > import org.apache.lucene.analysis.standard.StandardAnalyzer; > import org.apache.lucene.util.Version; > import org.apache.solr.common.util.NamedList; > import org.junit.Test; > public class SimpleQueryConverterTest { > @Test > public void testSimpleQueryConversion() { > SpellingQueryConverter converter = new SpellingQueryConverter(); > converter.init(new NamedList()); > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); > String original = "another-test"; > Collection<Token> tokens = converter.convert(original); > assertTrue("Token offsets do not match", > isOffsetCorrect(original, tokens)); > } > private boolean isOffsetCorrect(String s, Collection<Token> tokens) { > for (Token token : tokens) { > int start = token.startOffset(); > int end = token.endOffset(); > if (!s.substring(start, end).equals(token.toString())) > return false; > } > return true; > } > }