Re: RemoveDuplicatesTokenFilter redundancy problem?

Robert Muir Tue, 22 Dec 2009 17:52:12 -0800

another option is to instead of looking ahead with the wierd Big-O runtime
you noticed, use a set to keep track of which terms have been seen (cleared
after each word with posInc>0).


i implemented this with the new ts api already and will plop the patch on
SOLR-1657

On Tue, Dec 22, 2009 at 7:47 PM, Lance Norskog <goks...@gmail.com> wrote:

> It looks like the inner loop of
> org.apache.solr.analysis.RemoveDuplicatesTokenFilter could use a
> 'break'. I don't remember enough Big-O analysis to give the
> difference, but they will be two different formulae.
>
> For people doing large documents (I've heard gigabytes for email
> forensics) this would matter...
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

Re: RemoveDuplicatesTokenFilter redundancy problem?

Reply via email to