[ 
https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791374#action_12791374
 ] 

Robert Muir commented on SOLR-1662:
-----------------------------------

bq. but Robert told me that e.g. RemoveDuplicates has a similar problem.

Right, there is no cloning in RemoveDuplicates. CommonGrams creates a new 
Token() when it grams, but its not clear that this one is correct either.

So if we decide its the responsibility of the subclass, these implementations 
need thorough tests to see if they are ok or not.
If we add the cloning to BufferedTokenStream itself, then we know they are 
ok... 


> BufferedTokenStream incorrect cloning
> -------------------------------------
>
>                 Key: SOLR-1662
>                 URL: https://issues.apache.org/jira/browse/SOLR-1662
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>            Reporter: Robert Muir
>
> As part of writing tests for SOLR-1657, I rewrote one of the base classes 
> (BaseTokenTestCase) to use the new TokenStream API, but also with some 
> additional safety.
> {code}
>  public static String tsToString(TokenStream in) throws IOException {
>     StringBuilder out = new StringBuilder();
>     TermAttribute termAtt = (TermAttribute) 
> in.addAttribute(TermAttribute.class);
>     // extra safety to enforce, that the state is not preserved and also
>     // assign bogus values
>     in.clearAttributes();
>     termAtt.setTermBuffer("bogusTerm");
>     while (in.incrementToken()) {
>       if (out.length() > 0)
>         out.append(' ');
>       out.append(termAtt.term());
>       in.clearAttributes();
>       termAtt.setTermBuffer("bogusTerm");
>     }
>     in.close();
>     return out.toString();
>   }
> {code}
> Setting the term text to bogus values helps find bugs in tokenstreams that do 
> not clear or clone properly. In this case there is a problem with a 
> tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B -> A A 
> B but does not clone, so the values get overwritten.
> This can be fixed in two ways: 
> * BufferedTokenStream does the cloning
> * subclasses are responsible for the cloning
> The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to