On May 27, 2010, at 11:52 AM, Drew Farris wrote: > > Not at all. > > The alternative that's been discussed here in the past would involve some > custom analyzer work. The general idea is to load the output from the > CollocDriver into a bloom filter and then when processing documents at > indexing time, set up a field where you generate shingles and only index > those that appear in the bloom filter. This way you wind up getting a set of > ngrams indexed that are ranked high across the entire corpus instead of > simply the best ones for each document. >
I'd be happy with each doc at this point
