another option is to instead of looking ahead with the wierd Big-O runtime
you noticed, use a set to keep track of which terms have been seen (cleared
after each word with posInc>0).

i implemented this with the new ts api already and will plop the patch on
SOLR-1657

On Tue, Dec 22, 2009 at 7:47 PM, Lance Norskog <goks...@gmail.com> wrote:

> It looks like the inner loop of
> org.apache.solr.analysis.RemoveDuplicatesTokenFilter could use a
> 'break'. I don't remember enough Big-O analysis to give the
> difference, but they will be two different formulae.
>
> For people doing large documents (I've heard gigabytes for email
> forensics) this would matter...
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to