Thanks for the pointers, Mike.  I'm trying to determine the math to resolve
some strange numbers we're seeing.  Here's the top dozen lines from a jmap
analysis on a heap dump:

Size        Count     Class description
---------------------------------------------------------
428246064   1792204   int[]
93175176    3213131   char[]
77195040    3216460   java.lang.String
67479112    3945      long[]
53073888    1658559   java.util.LinkedHashMap$Entry
39668352    1652848   org.apache.solr.search.HashDocSet
28195280    27131     byte[]
27165456    1697841   org.apache.lucene.index.Term
27024016    1689001   org.apache.lucene.search.TermQuery
22265920    695810    org.apache.lucene.document.Field
4931568     5974      java.lang.Object[]
4366768     77978     org.apache.lucene.store.FSIndexInput

I see the HashDocSet numbers (count=1.65 million), assume they have
references to the int arrays (count=1.79 million)  and wonder how I could
have so many of those in memory.  A few more data tidbits:

- Facet field Id1 = type int, unique values = 2710
- Facet field Id2 = type int, unique values = 65
- Facet field Id3 = type string, unique values = 15179

Thanks for the extra eyes on this, much appreciated.

-- j



On 4/2/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 4/2/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> With facet queries and the fields used, what qualifies as a "large"
number
> of values?  The wiki uses U.S. states as an example, so the number of
unique
> values = 50.  More to the point, is there an algorithm that I can use to
> estimate the cache consumption rate for facet queries?

The cache consumption rate is one entry per unique value in all
faceted fields, excluding fields that have faceting satisfied via
FieldCache (single-valued fields with exacly one token per document).

The size of each cached filter is num docs / 8 bytes, unless the
number of maching docs is less than the useHashSet threshold in
solrconfig.xml.

Sorting requires FieldCache population, which consists of an integer
per document plus the sum of the lengths of the unique values in the
field (less for pure int/float fields, but I'm not sure if Solr's sint
qualifies).

Both faceting and sorting shouldn't consume more memory after their
datastructures have been built, so it would be odd to see OOM after 48
hours if they were the cause.

-Mike

Reply via email to