It looks like the inner loop of org.apache.solr.analysis.RemoveDuplicatesTokenFilter could use a 'break'. I don't remember enough Big-O analysis to give the difference, but they will be two different formulae.
For people doing large documents (I've heard gigabytes for email forensics) this would matter... -- Lance Norskog goks...@gmail.com