On Thu, Sep 17, 2009 at 6:29 PM, Lance Norskog <goks...@gmail.com> wrote: > Please add a Jira issue for this. It will get more attention there. > > BTW, thanks for creating such a precise bug report.
+1 Thanks, I had missed this. This is serious, and looks due to a Lucene back compat break. I've added the testcase and can confirm the bug. -Yonik http://www.lucidimagination.com > On Mon, Sep 14, 2009 at 1:52 PM, Gregg Donovan <gregg...@gmail.com> wrote: >> I'm running into an odd issue with multi-word synonyms in Solr (using >> the latest [9/14/09] nightly ). Things generally seem to work as >> expected, but I sometimes see words that are the leading term in a >> multi-word synonym being replaced with the token that follows them in >> the stream when they should just be ignored (i.e. there's no synonym >> match for just that token). When I preview the analysis at >> admin/analysis.jsp it looks fine, but at runtime I see problems like >> the one in the unit test below. It's a simple case, so I assume I'm >> making some sort of configuration and/or usage error. >> >> package org.apache.solr.analysis; >> import java.io.*; >> import java.util.*; >> import org.apache.lucene.analysis.WhitespaceTokenizer; >> import org.apache.lucene.analysis.tokenattributes.TermAttribute; >> >> public class TestMultiWordSynonmys extends junit.framework.TestCase { >> >> public void testMultiWordSynonmys() throws IOException { >> List<String> rules = new ArrayList<String>(); >> rules.add( "a b c,d" ); >> SynonymMap synMap = new SynonymMap( true ); >> SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null); >> >> SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new >> StringReader("a e")), synMap ); >> TermAttribute termAtt = (TermAttribute) >> ts.getAttribute(TermAttribute.class); >> >> ts.reset(); >> List<String> tokens = new ArrayList<String>(); >> while (ts.incrementToken()) tokens.add( termAtt.term() ); >> >> // This fails because ["e","e"] is the value of the token stream >> assertEquals(Arrays.asList("a","e"), tokens); >> } >> } >> >> Any help would be much appreciated. Thanks. >> >> --Gregg >> > > > > -- > Lance Norskog > goks...@gmail.com >