Hi, recently we're experiencing OOMEs (GC overhead limit exceeded) in our searches. Therefore I want to get some clarification on heap and cache configuration.
This is the situation: - Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit - JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m -XX:NewSize=2G -XX:MaxNewSize=2G -XX:SurvivorRatio=6 -XX:+UseParallelOldGC -XX:+UseParallelGC - The machine has 32 GB RAM - Currently there are 4 processors/cores in the machine, this shall be changed to 2 cores in the future. - The index size in the filesystem is ~9.5 GB - The index contains ~ 5.500.000 documents - 1.500.000 of those docs are available for searches/queries, the rest are inactive docs that are excluded from searches (via a flag/field), but they're still stored in the index as need to be available by id (solr is the main document store in this app) - Caches are configured with a big size (the idea was to prevent filesystem access / disk i/o as much as possible): - filterCache (solr.LRUCache): size=200000, initialSize=30000, autowarmCount=1000, actual size =~ 60.000, hitratio =~ 0.99 - documentCache (solr.LRUCache): size=200000, initialSize=100000, autowarmCount=0, actual size =~ 160.000 - 190.000, hitratio =~ 0.74 - queryResultCache (solr.LRUCache): size=200000, initialSize=30000, autowarmCount=10000, actual size =~ 10.000 - 60.000, hitratio =~ 0.71 - Searches are performed using a catchall text field using standard request handler, all fields are fetched (no fl specified) - Normally ~ 5 concurrent requests, peaks up to 30 or 40 (mostly during GC) - Recently we also added a feature that adds weighted search for special fields, so that the query might become s.th. like this q=(some query) OR name_weighted:(some query)^2.0 OR brand_weighted:(some query)^4.0 OR longDescription_weighted:(some query)^0.5 (it seemed as if this was the cause of the OOMEs, but IMHO it only increased RAM usage so that now GC could not free enough RAM) The OOMEs that we get are of type "GC overhead limit exceeded", one of the OOMEs was thrown during auto-warming. I checked two different heapdumps, the first one autogenerated (by -XX:+HeapDumpOnOutOfMemoryError) the second one generated manually via jmap. These show the following distribution of used memory - the autogenerated dump: - documentCache: 56% (size ~ 195.000) - filterCache: 15% (size ~ 60.000) - queryResultCache: 8% (size ~ 61.000) - fieldCache: 6% (fieldCache referenced by WebappClassLoader) - SolrIndexSearcher: 2% The manually generated dump: - documentCache: 48% (size ~ 195.000) - filterCache: 20% (size ~ 60.000) - fieldCache: 11% (fieldCache hängt am WebappClassLoader) - queryResultCache: 7% (size ~ 61.000) - fieldValueCache: 3% We are also running two search engines with 17GB heap, these don't run into OOMEs. Though, with these bigger heap sizes the longest requests are even longer due to longer stop-the-world gc cycles. Therefore my goal is to run with a smaller heap, IMHO even smaller than 8GB would be good to reduce the time needed for full gc. So what's the right path to follow now? What would you recommend to change on the configuration (solr/jvm)? Would you say it is ok to reduce the cache sizes? Would this increase disk i/o, or would the index be hold in the OS's disk cache? Do have other recommendations to follow / questions? Thanx && cheers, Martin