On May 27, 2010, at 11:52 AM, Drew Farris wrote:

> 
> Not at all.
> 
> The alternative that's been discussed here in the past would involve some
> custom analyzer work. The general idea is to load the output from the
> CollocDriver into a bloom filter and then when processing documents at
> indexing time, set up a field where you generate shingles and only index
> those that appear in the bloom filter. This way you wind up getting a set of
> ngrams indexed that are ranked high across the entire corpus instead of
> simply the best ones for each document.
> 

I'd be happy with each doc at this point

Reply via email to