RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Steven A Rowe Thu, 15 Dec 2011 14:15:33 -0800

Hi Brandon,

When I add the following to SpellingQueryConverterTest.java on the tip of 
branch_3x (will be released as Solr 3.6), the test succeeds:


@Test
public void testStandardAnalyzerWithHyphen() {
  SpellingQueryConverter converter = new SpellingQueryConverter();
  converter.init(new NamedList());
  converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
  String original = "another-test";
  Collection<Token> tokens = converter.convert(original);
  assertTrue("tokens is null and it shouldn't be", tokens != null);
  assertEquals("tokens Size: " + tokens.size() + " is not 2", 2, tokens.size());
  assertTrue("Token offsets do not match", isOffsetCorrect(original, tokens));
}

What version of Solr/Lucene are you using?

Steve

> -----Original Message-----
> From: Brandon Fish [mailto:brandon.j.f...@gmail.com]
> Sent: Thursday, December 15, 2011 3:08 PM
> To: solr-user@lucene.apache.org
> Subject: Is there an issue with hypens in SpellChecker with
> StandardTokenizer?
> 
> I am getting an error using the SpellChecker component with the query
> "another-test"
> java.lang.StringIndexOutOfBoundsException: String index out of range: -7
> 
> This appears to be related to this
> issue<https://issues.apache.org/jira/browse/SOLR-1630> which
> has been marked as fixed. My configuration and test case that follows
> appear to reproduce the error I am seeing. Both "another" and "test" get
> changed into tokens with start and end offsets of 0 and 12.
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> 
>      &spellcheck=true&spellcheck.collate=true
> 
> Is this an issue with my configuration/test or is there an issue with the
> SpellingQueryConverter? Is there a recommended work around such as the
> WhitespaceTokenizer as mention in the issue comments?
> 
> Thank you for your help.
> 
> package org.apache.solr.spelling;
> import static org.junit.Assert.assertTrue;
> import java.util.Collection;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.util.Version;
> import org.apache.solr.common.util.NamedList;
> import org.junit.Test;
> public class SimpleQueryConverterTest {
>  @Test
> public void testSimpleQueryConversion() {
> SpellingQueryConverter converter = new SpellingQueryConverter();
>  converter.init(new NamedList());
> converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
> String original = "another-test";
>  Collection<Token> tokens = converter.convert(original);
> assertTrue("Token offsets do not match",
> isOffsetCorrect(original, tokens));
>  }
> private boolean isOffsetCorrect(String s, Collection<Token> tokens) {
> for (Token token : tokens) {
>  int start = token.startOffset();
> int end = token.endOffset();
> if (!s.substring(start, end).equals(token.toString()))
>  return false;
> }
> return true;
> }
> }

RE: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Reply via email to