Thanks for the pointers, Mike. I'm trying to determine the math to resolve some strange numbers we're seeing. Here's the top dozen lines from a jmap analysis on a heap dump:
Size Count Class description --------------------------------------------------------- 428246064 1792204 int[] 93175176 3213131 char[] 77195040 3216460 java.lang.String 67479112 3945 long[] 53073888 1658559 java.util.LinkedHashMap$Entry 39668352 1652848 org.apache.solr.search.HashDocSet 28195280 27131 byte[] 27165456 1697841 org.apache.lucene.index.Term 27024016 1689001 org.apache.lucene.search.TermQuery 22265920 695810 org.apache.lucene.document.Field 4931568 5974 java.lang.Object[] 4366768 77978 org.apache.lucene.store.FSIndexInput I see the HashDocSet numbers (count=1.65 million), assume they have references to the int arrays (count=1.79 million) and wonder how I could have so many of those in memory. A few more data tidbits: - Facet field Id1 = type int, unique values = 2710 - Facet field Id2 = type int, unique values = 65 - Facet field Id3 = type string, unique values = 15179 Thanks for the extra eyes on this, much appreciated. -- j On 4/2/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
On 4/2/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > With facet queries and the fields used, what qualifies as a "large" number > of values? The wiki uses U.S. states as an example, so the number of unique > values = 50. More to the point, is there an algorithm that I can use to > estimate the cache consumption rate for facet queries? The cache consumption rate is one entry per unique value in all faceted fields, excluding fields that have faceting satisfied via FieldCache (single-valued fields with exacly one token per document). The size of each cached filter is num docs / 8 bytes, unless the number of maching docs is less than the useHashSet threshold in solrconfig.xml. Sorting requires FieldCache population, which consists of an integer per document plus the sum of the lengths of the unique values in the field (less for pure int/float fields, but I'm not sure if Solr's sint qualifies). Both faceting and sorting shouldn't consume more memory after their datastructures have been built, so it would be odd to see OOM after 48 hours if they were the cause. -Mike