Accoriding to Yonik I can't use minDf because I'm faceting on a string field. I'm thinking of changing it to a tokenized type so that I can utilize this setting, but then I'll have to rebuild my entire index.
Unless there's some way around that? > -----Original Message----- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 10, 2007 4:56 PM > To: solr-user@lucene.apache.org > Cc: stuhood > Subject: Re: Facets and running out of Heap Space > > On 10-Oct-07, at 12:19 PM, David Whalen wrote: > > > It looks now like I can't use facets the way I was hoping > to because > > the memory requirements are impractical. > > I can't remember if this has been mentioned, but upping the > HashDocSet size is one way to reduce memory consumption. > Whether this will work well depends greatly on the > cardinality of your facet sets. facet.enum.cache.minDf set > high is another option (will not generate a bitset for any > value whose facet set is less that this value). > > Both options have performance implications. > > > So, as an alternative I was thinking I could get counts by doing > > rows=0 and using filter queries. > > > > Is there a reason to think that this might perform better? > > Or, am I simply moving the problem to another step in the process? > > Running one query per unique facet value seems impractical, > if that is what you are suggesting. Setting minDf to a very > high value should always outperform such an approach. > > -Mike > > > DW > > > > > > > >> -----Original Message----- > >> From: Stu Hood [mailto:[EMAIL PROTECTED] > >> Sent: Tuesday, October 09, 2007 10:53 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Facets and running out of Heap Space > >> > >>> Using the filter cache method on the things like media type and > >>> location; this will occupy ~2.3MB of memory _per unique value_ > >> > >> Mike, how did you calculate that value? I'm trying to tune > my caches, > >> and any equations that could be used to determine some balanced > >> settings would be extremely helpful. I'm in a memory limited > >> environment, so I can't afford to throw a ton of cache at the > >> problem. > >> > >> (I don't want to thread-jack, but I'm also wondering > whether anyone > >> has any notes on how to tune cache sizes for the filterCache, > >> queryResultCache and documentCache). > >> > >> Thanks, > >> Stu > >> > >> > >> -----Original Message----- > >> From: Mike Klaas <[EMAIL PROTECTED]> > >> Sent: Tuesday, October 9, 2007 9:30pm > >> To: solr-user@lucene.apache.org > >> Subject: Re: Facets and running out of Heap Space > >> > >> On 9-Oct-07, at 12:36 PM, David Whalen wrote: > >> > >>> (snip) > >>> I'm sure we could stop storing many of these columns, > >> especially if > >>> someone told me that would make a big difference. > >> > >> I don't think that it would make a difference in memory > consumption, > >> but storage is certainly not necessary for faceting. Extra stored > >> fields can slow down search if they are large (in terms of bytes), > >> but don't really occupy extra memory, unless they are > polluting the > >> doc cache. Does 'text' > >> need to be stored? > >>> > >>>> what does the LukeReqeust Handler tell you about the # > of distinct > >>>> terms in each field that you facet on? > >>> > >>> Where would I find that? I could probably estimate that > >> myself on a > >>> per-column basis. it ranges from 4 distinct values for > >> media_type to > >>> 30-ish for location to 200-ish for country_code to almost > >> 10,000 for > >>> site_id to almost 100,000 for journalist_id. > >> > >> Using the filter cache method on the things like media type and > >> location; this will occupy ~2.3MB of memory _per unique > value_, so it > >> should be a net win for those (although quite close in space > >> requirements for a 30-ary field on your index size). > >> > >> -Mike > >> > >> > > >