Re: Highest frequency terms for a subset of documents

Ofer Fort Wed, 20 Apr 2011 16:46:21 -0700

Thanks
but i've disabled the cache already, since my concern is speed and i'm
willing to pay the price (memory), and my subset are not fixed.
Does the facet search do any extra work that i don't need, that i might be
able to disable (either by a flag or by a code change),
Somehow i feel, or rather hope, that counting the terms of 200K documents
and finding the top 500 should take less than 30 seconds.



On Thu, Apr 21, 2011 at 2:41 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Wed, Apr 20, 2011 at 7:34 PM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
> >
> > : thanks, but that's what i started with, but it took an even longer time
> and
> > : threw this:
> > : Approaching too many values for UnInvertedField faceting on field
> 'text' :
> > : bucket size=15560140
> > : Approaching too many values for UnInvertedField faceting on field 'text
> :
> > : bucket size=15619075
> > : Exception during facet counts:org.apache.solr.common.SolrException: Too
> many
> > : values for UnInvertedField faceting on field text
> >
> > right ... facet.method=fc is a good default, but cases like full text
> > faceting can cause it to seriously blow up the memory ... i didn't eve
> > realize it was possible to get it to fail this way, i would have just
> > expected an OutOfmemoryException.
> >
> > facet.method=enum is probably your best bet in this situation precisely
> > because it does a linera scan over the terms ... it's slower because it's
> > safer.
> >
> > the one speed up you might be able to get is to ensure you don't use the
> > filterCache -- that way you don't wast time constantly
> caching/overwriting
> > DocSets
>
> Right - or only using filterCache for high df terms via
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.enum.cache.minDf
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>

Re: Highest frequency terms for a subset of documents

Reply via email to