Re: Troubleshooting java heap out-of-memory
Hoping I can get a better response with a more directed question: With facet queries and the fields used, what qualifies as a large number of values? The wiki uses U.S. states as an example, so the number of unique values = 50. More to the point, is there an algorithm that I can use to estimate the cache consumption rate for facet queries? -- j On 4/1/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: I've read through the list entries here, the Lucene list, and the wiki docs and am not resolving a major pain point for us. We've been trying to determine what could possibly cause us to hit this in our given environment, and am hoping more eyes on this issue can help. Our scenario: 150MB index, 14 documents, read/write servers in place using standard replication. Running Tomcat 5.5.17 on Redhat Enterprise Linux 4. Java configured to start with -Xmx1024m. We encounter java heap out-of-memory issues on the read server at staggered times, but usually once every 48 hours. Search request load is roughly 2 searches every 3 seconds, with some spikes here or there. We are using facets: 3 are based on type integer, one is based on type string. We are using sorts: 1 is based on type sint, 2 are based on type date. Caching is disabled. Solr bits are also from September 2006. Is there anything in that configuration that we should interrogate? thanks, j
Re: Troubleshooting java heap out-of-memory
On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Hoping I can get a better response with a more directed question: I haven't answered your original question as it seems that general java memory debugging techniques would be the most useful thing here. With facet queries and the fields used, what qualifies as a large number of values? The wiki uses U.S. states as an example, so the number of unique values = 50. More to the point, is there an algorithm that I can use to estimate the cache consumption rate for facet queries? The cache consumption rate is one entry per unique value in all faceted fields, excluding fields that have faceting satisfied via FieldCache (single-valued fields with exacly one token per document). The size of each cached filter is num docs / 8 bytes, unless the number of maching docs is less than the useHashSet threshold in solrconfig.xml. Sorting requires FieldCache population, which consists of an integer per document plus the sum of the lengths of the unique values in the field (less for pure int/float fields, but I'm not sure if Solr's sint qualifies). Both faceting and sorting shouldn't consume more memory after their datastructures have been built, so it would be odd to see OOM after 48 hours if they were the cause. -Mike
Re: Troubleshooting java heap out-of-memory
On 4/1/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Our scenario: 150MB index, 14 documents, read/write servers in place using standard replication. Running Tomcat 5.5.17 on Redhat Enterprise Linux 4. Java configured to start with -Xmx1024m. We encounter java heap out-of-memory issues on the read server at staggered times, but usually once every 48 hours. Could you do a grep through your server logs for WARNING, to eliminate the possibility of multiple overlapping searchers causing the OOM issue? Are you doing incremental updates? If so, try lowering your mergeFactor for the index, or optimize more frequently. As an index is incrementally updated, old docs are marked as deleted and new docs are added. This leaves holes in the document id space which can increase memory usage. Both BitSet filters and FieldCache entry sizes are proportionally related to maxDoc (the maximum internal docid in the index). You can see maxDoc from the statistics page... there might be a correlation. -Yonik
Re: Troubleshooting java heap out-of-memory
On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/1/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Our scenario: 150MB index, 14 documents, read/write servers in place using standard replication. Running Tomcat 5.5.17 on Redhat Enterprise Linux 4. Java configured to start with -Xmx1024m. We encounter java heap out-of-memory issues on the read server at staggered times, but usually once every 48 hours. Could you do a grep through your server logs for WARNING, to eliminate the possibility of multiple overlapping searchers causing the OOM issue? We're not seeing warnings for overlapping searchers prior to the oom events. Only SEVERE -- java.lang.OutOfMemoryError: Java heap space. Are you doing incremental updates? If so, try lowering your mergeFactor for the index, or optimize more frequently. As an index is incrementally updated, old docs are marked as deleted and new docs are added. This leaves holes in the document id space which can increase memory usage. Both BitSet filters and FieldCache entry sizes are proportionally related to maxDoc (the maximum internal docid in the index). You can see maxDoc from the statistics page... there might be a correlation. We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145
Re: Troubleshooting java heap out-of-memory
Thanks for the pointers, Mike. I'm trying to determine the math to resolve some strange numbers we're seeing. Here's the top dozen lines from a jmap analysis on a heap dump: SizeCount Class description - 428246064 1792204 int[] 931751763213131 char[] 771950403216460 java.lang.String 674791123945 long[] 530738881658559 java.util.LinkedHashMap$Entry 396683521652848 org.apache.solr.search.HashDocSet 2819528027131 byte[] 271654561697841 org.apache.lucene.index.Term 270240161689001 org.apache.lucene.search.TermQuery 22265920695810org.apache.lucene.document.Field 4931568 5974 java.lang.Object[] 4366768 77978 org.apache.lucene.store.FSIndexInput I see the HashDocSet numbers (count=1.65 million), assume they have references to the int arrays (count=1.79 million) and wonder how I could have so many of those in memory. A few more data tidbits: - Facet field Id1 = type int, unique values = 2710 - Facet field Id2 = type int, unique values = 65 - Facet field Id3 = type string, unique values = 15179 Thanks for the extra eyes on this, much appreciated. -- j On 4/2/07, Mike Klaas [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: With facet queries and the fields used, what qualifies as a large number of values? The wiki uses U.S. states as an example, so the number of unique values = 50. More to the point, is there an algorithm that I can use to estimate the cache consumption rate for facet queries? The cache consumption rate is one entry per unique value in all faceted fields, excluding fields that have faceting satisfied via FieldCache (single-valued fields with exacly one token per document). The size of each cached filter is num docs / 8 bytes, unless the number of maching docs is less than the useHashSet threshold in solrconfig.xml. Sorting requires FieldCache population, which consists of an integer per document plus the sum of the lengths of the unique values in the field (less for pure int/float fields, but I'm not sure if Solr's sint qualifies). Both faceting and sorting shouldn't consume more memory after their datastructures have been built, so it would be odd to see OOM after 48 hours if they were the cause. -Mike
Re: Troubleshooting java heap out-of-memory
On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145 What version of Solr are you using? Another potential OOM (multiple threads generating the same FieldCache entry) was fixed in later versions of Lucene included with Solr. -Yonik
Re: Troubleshooting java heap out-of-memory
Sorry for the confusion. We do have caching disabled. I was asking the question because I wasn't certain if the configurable cache settings applied throughout, or if the FieldCache in lucene still came in play. The two integer-based facets are single valued per document. The string-based facet is multiValued. On 4/2/07, Chris Hostetter [EMAIL PROTECTED] wrote: : values = 50. More to the point, is there an algorithm that I can use to : estimate the cache consumption rate for facet queries? I'm confused ... i thought you said in your orriginal mail that you had all the caching disabled? (except for FieldCache which is so low level in Lucene it's always used) are the fields you are faceting on multiValued or single valued? -Hoss
Re: Troubleshooting java heap out-of-memory
Major version is 1.0. The bits are from a nightly build from early September 2006. We do have plans to upgrade solr soon. On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145 What version of Solr are you using? Another potential OOM (multiple threads generating the same FieldCache entry) was fixed in later versions of Lucene included with Solr. -Yonik
Re: Troubleshooting java heap out-of-memory
Yonik - is this the JIRA entry you're referring to? http://issues.apache.org/jira/browse/LUCENE-754 On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145 What version of Solr are you using? Another potential OOM (multiple threads generating the same FieldCache entry) was fixed in later versions of Lucene included with Solr. -Yonik
Re: Troubleshooting java heap out-of-memory
On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Yonik - is this the JIRA entry you're referring to? http://issues.apache.org/jira/browse/LUCENE-754 Yes. But from the heap dump you provided, that doesn't look like the issue. -Yonik