Re: Highest frequency terms for a subset of documents

Ofer Fort Wed, 20 Apr 2011 17:07:48 -0700

my documents are user entries, so i'm guessing they vary a lot.
Tomorrow i'll try 3.1 and also 4.0, and see if they have an improvement.
thanks guys!


On Thu, Apr 21, 2011 at 3:02 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Wed, Apr 20, 2011 at 7:45 PM, Ofer Fort <o...@tra.cx> wrote:
> > Thanks
> > but i've disabled the cache already, since my concern is speed and i'm
> > willing to pay the price (memory)
>
> Then you should not disable the cache.
>
> >, and my subset are not fixed.
> > Does the facet search do any extra work that i don't need, that i might
> be
> > able to disable (either by a flag or by a code change),
> > Somehow i feel, or rather hope, that counting the terms of 200K documents
> > and finding the top 500 should take less than 30 seconds.
>
> Using facet.enum.cache.minDf should be a little faster than just
> disabling the cache - it's a different code path.
> Using the cache selectively will speed things up, so try setting that
> minDf to 1000 or so for example.
>
> How many unique terms do you have in the index?
> Is this Solr 3.1 - there were some optimizations when there were many
> terms to iterate over?
> You could also try trunk, which has even more optimizations, or the
> bulkpostings branch if you really want to experiment.
>
> -Yonik
>

Re: Highest frequency terms for a subset of documents

Reply via email to