I'm running into an odd issue with multi-word synonyms in Solr (using the latest [9/14/09] nightly ). Things generally seem to work as expected, but I sometimes see words that are the leading term in a multi-word synonym being replaced with the token that follows them in the stream when they should just be ignored (i.e. there's no synonym match for just that token). When I preview the analysis at admin/analysis.jsp it looks fine, but at runtime I see problems like the one in the unit test below. It's a simple case, so I assume I'm making some sort of configuration and/or usage error.
package org.apache.solr.analysis; import java.io.*; import java.util.*; import org.apache.lucene.analysis.WhitespaceTokenizer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; public class TestMultiWordSynonmys extends junit.framework.TestCase { public void testMultiWordSynonmys() throws IOException { List<String> rules = new ArrayList<String>(); rules.add( "a b c,d" ); SynonymMap synMap = new SynonymMap( true ); SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null); SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new StringReader("a e")), synMap ); TermAttribute termAtt = (TermAttribute) ts.getAttribute(TermAttribute.class); ts.reset(); List<String> tokens = new ArrayList<String>(); while (ts.incrementToken()) tokens.add( termAtt.term() ); // This fails because ["e","e"] is the value of the token stream assertEquals(Arrays.asList("a","e"), tokens); } } Any help would be much appreciated. Thanks. --Gregg