BufferedTokenStream incorrect cloning -------------------------------------
Key: SOLR-1662 URL: https://issues.apache.org/jira/browse/SOLR-1662 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase) to use the new TokenStream API, but also with some additional safety. {code} public static String tsToString(TokenStream in) throws IOException { StringBuilder out = new StringBuilder(); TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class); // extra safety to enforce, that the state is not preserved and also // assign bogus values in.clearAttributes(); termAtt.setTermBuffer("bogusTerm"); while (in.incrementToken()) { if (out.length() > 0) out.append(' '); out.append(termAtt.term()); in.clearAttributes(); termAtt.setTermBuffer("bogusTerm"); } in.close(); return out.toString(); } {code} Setting the term text to bogus values helps find bugs in tokenstreams that do not clear or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B -> A A B but does not clone, so the values get overwritten. This can be fixed in two ways: * BufferedTokenStream does the cloning * subclasses are responsible for the cloning The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.