Please add a Jira issue for this. It will get more attention there. BTW, thanks for creating such a precise bug report.
On Mon, Sep 14, 2009 at 1:52 PM, Gregg Donovan <gregg...@gmail.com> wrote: > I'm running into an odd issue with multi-word synonyms in Solr (using > the latest [9/14/09] nightly ). Things generally seem to work as > expected, but I sometimes see words that are the leading term in a > multi-word synonym being replaced with the token that follows them in > the stream when they should just be ignored (i.e. there's no synonym > match for just that token). When I preview the analysis at > admin/analysis.jsp it looks fine, but at runtime I see problems like > the one in the unit test below. It's a simple case, so I assume I'm > making some sort of configuration and/or usage error. > > package org.apache.solr.analysis; > import java.io.*; > import java.util.*; > import org.apache.lucene.analysis.WhitespaceTokenizer; > import org.apache.lucene.analysis.tokenattributes.TermAttribute; > > public class TestMultiWordSynonmys extends junit.framework.TestCase { > > public void testMultiWordSynonmys() throws IOException { > List<String> rules = new ArrayList<String>(); > rules.add( "a b c,d" ); > SynonymMap synMap = new SynonymMap( true ); > SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null); > > SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new > StringReader("a e")), synMap ); > TermAttribute termAtt = (TermAttribute) > ts.getAttribute(TermAttribute.class); > > ts.reset(); > List<String> tokens = new ArrayList<String>(); > while (ts.incrementToken()) tokens.add( termAtt.term() ); > > // This fails because ["e","e"] is the value of the token stream > assertEquals(Arrays.asList("a","e"), tokens); > } > } > > Any help would be much appreciated. Thanks. > > --Gregg > -- Lance Norskog goks...@gmail.com