I'm running into an odd issue with multi-word synonyms in Solr (using
the latest [9/14/09] nightly ). Things generally seem to work as
expected, but I sometimes see words that are the leading term in a
multi-word synonym being replaced with the token that follows them in
the stream when they should just be ignored (i.e. there's no synonym
match for just that token). When I preview the analysis at
admin/analysis.jsp it looks fine, but at runtime I see problems like
the one in the unit test below. It's a simple case, so I assume I'm
making some sort of configuration and/or usage error.

package org.apache.solr.analysis;
import java.io.*;
import java.util.*;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;

public class TestMultiWordSynonmys extends junit.framework.TestCase {

  public void testMultiWordSynonmys() throws IOException {
    List<String> rules = new ArrayList<String>();
    rules.add( "a b c,d" );
    SynonymMap synMap = new SynonymMap( true );
    SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null);

    SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new
StringReader("a e")), synMap );
    TermAttribute termAtt = (TermAttribute)
ts.getAttribute(TermAttribute.class);

    ts.reset();
    List<String> tokens = new ArrayList<String>();
    while (ts.incrementToken()) tokens.add( termAtt.term() );

    // This fails because ["e","e"] is the value of the token stream
    assertEquals(Arrays.asList("a","e"), tokens);
  }
}

Any help would be much appreciated. Thanks.

--Gregg

Reply via email to