Please add a Jira issue for this. It will get more attention there.

BTW, thanks for creating such a precise bug report.

On Mon, Sep 14, 2009 at 1:52 PM, Gregg Donovan <gregg...@gmail.com> wrote:
> I'm running into an odd issue with multi-word synonyms in Solr (using
> the latest [9/14/09] nightly ). Things generally seem to work as
> expected, but I sometimes see words that are the leading term in a
> multi-word synonym being replaced with the token that follows them in
> the stream when they should just be ignored (i.e. there's no synonym
> match for just that token). When I preview the analysis at
> admin/analysis.jsp it looks fine, but at runtime I see problems like
> the one in the unit test below. It's a simple case, so I assume I'm
> making some sort of configuration and/or usage error.
>
> package org.apache.solr.analysis;
> import java.io.*;
> import java.util.*;
> import org.apache.lucene.analysis.WhitespaceTokenizer;
> import org.apache.lucene.analysis.tokenattributes.TermAttribute;
>
> public class TestMultiWordSynonmys extends junit.framework.TestCase {
>
>   public void testMultiWordSynonmys() throws IOException {
>     List<String> rules = new ArrayList<String>();
>     rules.add( "a b c,d" );
>     SynonymMap synMap = new SynonymMap( true );
>     SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null);
>
>     SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new
> StringReader("a e")), synMap );
>     TermAttribute termAtt = (TermAttribute)
> ts.getAttribute(TermAttribute.class);
>
>     ts.reset();
>     List<String> tokens = new ArrayList<String>();
>     while (ts.incrementToken()) tokens.add( termAtt.term() );
>
>    // This fails because ["e","e"] is the value of the token stream
>     assertEquals(Arrays.asList("a","e"), tokens);
>   }
> }
>
> Any help would be much appreciated. Thanks.
>
> --Gregg
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to