RE: Facets and running out of Heap Space

David Whalen Wed, 10 Oct 2007 14:41:02 -0700

Accoriding to Yonik I can't use minDf because I'm faceting
on a string field.  I'm thinking of changing it to a tokenized
type so that I can utilize this setting, but then I'll have to
rebuild my entire index.


Unless there's some way around that?


  

> -----Original Message-----
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 10, 2007 4:56 PM
> To: solr-user@lucene.apache.org
> Cc: stuhood
> Subject: Re: Facets and running out of Heap Space
> 
> On 10-Oct-07, at 12:19 PM, David Whalen wrote:
> 
> > It looks now like I can't use facets the way I was hoping 
> to because 
> > the memory requirements are impractical.
> 
> I can't remember if this has been mentioned, but upping the 
> HashDocSet size is one way to reduce memory consumption.  
> Whether this will work well depends greatly on the 
> cardinality of your facet sets.  facet.enum.cache.minDf set 
> high is another option (will not generate a bitset for any 
> value whose facet set is less that this value).
> 
> Both options have performance implications.
> 
> > So, as an alternative I was thinking I could get counts by doing 
> > rows=0 and using filter queries.
> >
> > Is there a reason to think that this might perform better?
> > Or, am I simply moving the problem to another step in the process?
> 
> Running one query per unique facet value seems impractical, 
> if that is what you are suggesting.  Setting minDf to a very 
> high value should always outperform such an approach.
> 
> -Mike
> 
> > DW
> >
> >
> >
> >> -----Original Message-----
> >> From: Stu Hood [mailto:[EMAIL PROTECTED]
> >> Sent: Tuesday, October 09, 2007 10:53 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Facets and running out of Heap Space
> >>
> >>> Using the filter cache method on the things like media type and 
> >>> location; this will occupy ~2.3MB of memory _per unique value_
> >>
> >> Mike, how did you calculate that value? I'm trying to tune 
> my caches, 
> >> and any equations that could be used to determine some balanced 
> >> settings would be extremely helpful. I'm in a memory limited 
> >> environment, so I can't afford to throw a ton of cache at the 
> >> problem.
> >>
> >> (I don't want to thread-jack, but I'm also wondering 
> whether anyone 
> >> has any notes on how to tune cache sizes for the filterCache, 
> >> queryResultCache and documentCache).
> >>
> >> Thanks,
> >> Stu
> >>
> >>
> >> -----Original Message-----
> >> From: Mike Klaas <[EMAIL PROTECTED]>
> >> Sent: Tuesday, October 9, 2007 9:30pm
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Facets and running out of Heap Space
> >>
> >> On 9-Oct-07, at 12:36 PM, David Whalen wrote:
> >>
> >>> (snip)
> >>> I'm sure we could stop storing many of these columns,
> >> especially  if
> >>> someone told me that would make a big difference.
> >>
> >> I don't think that it would make a difference in memory 
> consumption, 
> >> but storage is certainly not necessary for faceting.  Extra stored 
> >> fields can slow down search if they are large (in terms of bytes), 
> >> but don't really occupy extra memory, unless they are 
> polluting the 
> >> doc cache.  Does 'text'
> >> need to be stored?
> >>>
> >>>> what does the LukeReqeust Handler tell you about the # 
> of distinct 
> >>>> terms in each field that you facet on?
> >>>
> >>> Where would I find that?  I could probably estimate that
> >> myself on a
> >>> per-column basis.  it ranges from 4 distinct values for
> >> media_type to
> >>> 30-ish for location to 200-ish for country_code to almost
> >> 10,000 for
> >>> site_id to almost 100,000 for journalist_id.
> >>
> >> Using the filter cache method on the things like media type and 
> >> location; this will occupy ~2.3MB of memory _per unique 
> value_, so it 
> >> should be a net win for those (although quite close in space 
> >> requirements for a 30-ary field on your index size).
> >>
> >> -Mike
> >>
> >>
> 
> 
>

RE: Facets and running out of Heap Space

Reply via email to