RE: Facets and running out of Heap Space

David Whalen Tue, 09 Oct 2007 12:39:03 -0700

> is this the same 25,000,000 document index you mentioned before?

Yep.


> how big is your index on disk? are you faceting or sorting on 
> other fields as well?

running 'du -h' on my index directory returns 86G.  We facet
on almost all of our index fields (they were added to the index
solely for that purpose, otherwise we'd remove them).  Here's
the meaty part of the config again:

<field name="id" type="string" indexed="true" stored="true" /> 
<field name="content_date" type="date" indexed="true" stored="true" /> 
<field name="media_type" type="string" indexed="true" stored="true" /> 
<field name="location" type="string" indexed="true" stored="true" /> 
<field name="country_code" type="string" indexed="true" stored="true" /> 
<field name="text" type="text" indexed="true" stored="true" multiValued="true" 
/> 
<field name="content_source" type="string" indexed="true" stored="true" /> 
<field name="title" type="string" indexed="true" stored="true" /> 
<field name="site_id" type="string" indexed="true" stored="true" /> 
<field name="journalist_id" type="string" indexed="true" stored="true" /> 
<field name="blog_url" type="string" indexed="true" stored="true" /> 
<field name="created_date" type="date" indexed="true" stored="true" /> 

I'm sure we could stop storing many of these columns, especially
if someone told me that would make a big difference.


> what does the LukeReqeust Handler tell you about the # of 
> distinct terms in each field that you facet on?

Where would I find that?  I could probably estimate that myself
on a per-column basis.  it ranges from 4 distinct values for
media_type to 30-ish for location to 200-ish for country_code
to almost 10,000 for site_id to almost 100,000 for journalist_id.

Thanks very much for your help so far, Chris!

Dave


  

> -----Original Message-----
> From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, October 09, 2007 2:48 PM
> To: solr-user
> Subject: Re: Facets and running out of Heap Space
> 
> 
> : So, naturally we increased the heap size and things worked
> : well for a while and then the errors would happen again.
> : We've increased the initial heap size to 2.5GB and it's
> : still happening.
> 
> is this the same 25,000,000 document index you mentioned before?
> 
> 2.5GB of heap doesn't seem like much if you are also doing 
> faceting ... 
> even if you are faceting on an int field, there's going to be 
> 95MB of FieldCache for that field, you said this was a string 
> field, so it's going to be 95MB+however much space is needed 
> for all the terms (presumably if you are faceting on this 
> field every doc doesn't have a unique value, but even 
> assuming a conservative 10% unique values of 10 characters 
> each that's another ~50MB, so we're up to about 150MB of 
> FieldCache to facet that field -- and we haven't even started 
> talking about how big the index is itself (or how big the 
> filterCache gets, or how many other fields you are faceting on)
> 
> how big is your index on disk? are you faceting or sorting on 
> other fields as well?
> 
> what does the LukeReqeust Handler tell you about the # of 
> distinct terms in each field that you facet on?
> 
> 
> 
> 
> -Hoss
> 
> 
>

RE: Facets and running out of Heap Space

Reply via email to