Re: Highest frequency terms for a subset of documents

Yonik Seeley Wed, 20 Apr 2011 17:03:12 -0700

On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort <o...@tra.cx> wrote:
> Thanks
> but i've disabled the cache already, since my concern is speed and i'm
> willing to pay the price (memory)


Then you should not disable the cache.

>, and my subset are not fixed.
> Does the facet search do any extra work that i don't need, that i might be
> able to disable (either by a flag or by a code change),
> Somehow i feel, or rather hope, that counting the terms of 200K documents
> and finding the top 500 should take less than 30 seconds.

Using facet.enum.cache.minDf should be a little faster than just
disabling the cache - it's a different code path.
Using the cache selectively will speed things up, so try setting that
minDf to 1000 or so for example.

How many unique terms do you have in the index?
Is this Solr 3.1 - there were some optimizations when there were many
terms to iterate over?
You could also try trunk, which has even more optimizations, or the
bulkpostings branch if you really want to experiment.

-Yonik

Re: Highest frequency terms for a subset of documents

Reply via email to