[jira] Issue Comment Edited: (SOLR-1670) synonymfilter/map repeat bug

Robert Muir (JIRA) Sun, 20 Dec 2009 02:34:44 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792980#action_12792980
 ]


Robert Muir edited comment on SOLR-1670 at 12/20/09 10:33 AM:
--------------------------------------------------------------

bq. Robert, can you tell if the flaw is in the test code or the SynonymFilter? 
I'm pretty sure that such a bug (in the filter) wasn't originally there... so 
my money would be on the test - perhaps getTokList, but I haven't had a chance 
to look yet.

I am pretty sure there is a problem (both the existing test, and the synonymMap 
merging code).
This is why i think there is some confusion.
The existing test implies that the output will be "ab", which makes me think 
the SynonymMap should merge these duplicate definitions.
The real output is "ab ab ab", but the test as written (expect "ab") passes 
with the existing test setup, the problem is assertTokEqual().

the rewritten test in SOLR-1674 looks like: 

{code}
assertTokenizesTo(map, "a b", new String[] { "ab", "ab", "ab"  });
{code}

where assertTokenizesTo is just

{code}
static void assertTokenizesTo(SynonymMap dict, String input,
      String expected[]) throws IOException {
    Tokenizer tokenizer = new WhitespaceTokenizer(new StringReader(input));
    SynonymFilter stream = new SynonymFilter(tokenizer, dict);
    assertTokenStreamContents(stream, expected);
  }
{code}

and assertTokenStreamContents is the one copied from lucene.

Also as mentioned before, all the position increment tests have a similar 
problem, where both the code and tests are buggy in such a way that we think it 
is working now.

None of the original positionIncrements are being preserved as the tests imply.



      was (Author: rcmuir):
    bq. Robert, can you tell if the flaw is in the test code or the 
SynonymFilter? I'm pretty sure that such a bug (in the filter) wasn't 
originally there... so my money would be on the test - perhaps getTokList, but 
I haven't had a chance to look yet.

I am pretty sure there is a problem. the rewritten test in SOLR-1674 looks 
like: 

{code}
assertTokenizesTo(map, "a b", new String[] { "ab", "ab", "ab"  });
{code}

where assertTokenizesTo is just

{code}
static void assertTokenizesTo(SynonymMap dict, String input,
      String expected[]) throws IOException {
    Tokenizer tokenizer = new WhitespaceTokenizer(new StringReader(input));
    SynonymFilter stream = new SynonymFilter(tokenizer, dict);
    assertTokenStreamContents(stream, expected);
  }
{code}

and assertTokenStreamContents is the one copied from lucene.

  
> synonymfilter/map repeat bug
> ----------------------------
>
>                 Key: SOLR-1670
>                 URL: https://issues.apache.org/jira/browse/SOLR-1670
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>            Reporter: Robert Muir
>         Attachments: SOLR-1670_test.patch
>
>
> as part of converting tests for SOLR-1657, I ran into a problem with 
> synonymfilter
> the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
> which does not really validate that two lists of token are equal, it just 
> stops at the shorted one.
> {code}
>     // repeats
>     map.add(strings("a b"), tokens("ab"), orig, merge);
>     map.add(strings("a b"), tokens("ab"), orig, merge);
>     assertTokEqual(getTokList(map,"a b",false), tokens("ab"));
>     /* in reality the result from getTokList is ab ab ab!!!!! */
> {code}
> when converted to assertTokenStreamContents this problem surfaced. attached 
> is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1670) synonymfilter/map repeat bug

Reply via email to