[ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791370#action_12791370 ]
Uwe Schindler commented on SOLR-1662: ------------------------------------- Just the short desription from the API side in Lucene: Lucene's documentation of TokenStream.next() says: "The returned Token is a "full private copy" (not re-used across calls to next())". AB_AAB_Stream.process() duplicates the token by just putting it uncloned into the outQueue. As the consumer of the BufferedTokenStream assumes that the Token is private it is allowed to change it - and by that it also changes the token in the outQueue. If you e.g. put another TokenFilter in fromt of this AB_AAB_Stream, and modify the token there it would break. In my opinion, the responsibility to clone is in AB_AAB_Stream, BufferedTokenStream will never return the same token twice by itsself. So its a bug in the test. But Robert told me that e.g. RemoveDuplicates has a similar problem. The general contract for writing such streams is: whenever you return a Token from next(), never put it somewhere else uncloned, because the caller can change it. The fix is to do: write((Token) t.clone()); > BufferedTokenStream incorrect cloning > ------------------------------------- > > Key: SOLR-1662 > URL: https://issues.apache.org/jira/browse/SOLR-1662 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 1.4 > Reporter: Robert Muir > > As part of writing tests for SOLR-1657, I rewrote one of the base classes > (BaseTokenTestCase) to use the new TokenStream API, but also with some > additional safety. > {code} > public static String tsToString(TokenStream in) throws IOException { > StringBuilder out = new StringBuilder(); > TermAttribute termAtt = (TermAttribute) > in.addAttribute(TermAttribute.class); > // extra safety to enforce, that the state is not preserved and also > // assign bogus values > in.clearAttributes(); > termAtt.setTermBuffer("bogusTerm"); > while (in.incrementToken()) { > if (out.length() > 0) > out.append(' '); > out.append(termAtt.term()); > in.clearAttributes(); > termAtt.setTermBuffer("bogusTerm"); > } > in.close(); > return out.toString(); > } > {code} > Setting the term text to bogus values helps find bugs in tokenstreams that do > not clear or clone properly. In this case there is a problem with a > tokenstream AB_AAB_Stream in TestBufferedTokenStream, it converts A B -> A A > B but does not clone, so the values get overwritten. > This can be fixed in two ways: > * BufferedTokenStream does the cloning > * subclasses are responsible for the cloning > The question is which one should it be? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.