[jira] Created: (SOLR-1662) BufferedTokenStream incorrect cloning

Robert Muir (JIRA) Wed, 16 Dec 2009 06:00:25 -0800

BufferedTokenStream incorrect cloning
-------------------------------------


                 Key: SOLR-1662
                 URL: https://issues.apache.org/jira/browse/SOLR-1662
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis
    Affects Versions: 1.4
            Reporter: Robert Muir


As part of writing tests for SOLR-1657, I rewrote one of the base classes 
(BaseTokenTestCase) to use the new TokenStream API, but also with some 
additional safety.
{code}
 public static String tsToString(TokenStream in) throws IOException {
    StringBuilder out = new StringBuilder();
    TermAttribute termAtt = (TermAttribute) 
in.addAttribute(TermAttribute.class);
    // extra safety to enforce, that the state is not preserved and also
    // assign bogus values
    in.clearAttributes();
    termAtt.setTermBuffer("bogusTerm");
    while (in.incrementToken()) {
      if (out.length() > 0)
        out.append(' ');
      out.append(termAtt.term());
      in.clearAttributes();
      termAtt.setTermBuffer("bogusTerm");
    }

    in.close();
    return out.toString();
  }
{code}

Setting the term text to bogus values helps find bugs in tokenstreams that do 
not clear or clone properly. In this case there is a problem with a tokenstream 
AB_AAB_Stream in TestBufferedTokenStream, it converts A B -> A A B but does not 
clone, so the values get overwritten.

This can be fixed in two ways: 
* BufferedTokenStream does the cloning
* subclasses are responsible for the cloning

The question is which one should it be?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1662) BufferedTokenStream incorrect cloning

Reply via email to