Re: SolrCloud OOM Problem
Great info. Can I ask how much data you are handling with that 6G or 7G heap? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152717.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
On 8/13/2014 5:34 AM, tuxedomoon wrote: Great info. Can I ask how much data you are handling with that 6G or 7G heap? My dev server is the one with the 7GB heap. My production servers only handle half the index shards, so they have the smaller heap. Here is the index size info from my dev server: [root@bigindy5 ~]# du -sh /index/solr4/data/ 131G/index/solr4/data/ This represents about 116 million total documents. Thanks, Shawn
Re: SolrCloud OOM Problem
On 8/13/2014 5:42 AM, tuxedomoon wrote: Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. On my setup, the queries come from a java webapp that uses SolrJ, which is running on multiple servers in a cluster. The updates come from a custom SolrJ application that I wrote. There is no queue, Solr is more than capable of handling the load that we give it. Full rebuilds are done with the dataimport handler. The source of all our Solr data is a MySQL database. Thanks, Shawn
Re: SolrCloud OOM Problem
I applied the OPTS you pointed me to, here's the full string: CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts jConsole is now showing lower heap usage. It had been climbing to 12G consistently, now it is only spiking to 10G every 10 minutes or so. Here's my top output === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4250 root 20 0 129g 14g 1.9g S2.021.317:40.61 java -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
On Tue, 2014-08-12 at 01:27 +0200, dancoleman wrote: My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes Is that 3 (each running a primary and a replica shard) or 6 instances? documents: 120M total zookeeper: 5 external t1.micros If your facet fields has many unique values and if you have many concurrent requests, then memory usage will be high. But by the looks of it, I guess that the facets fields has relatively few values? Still, if you have many concurrent queries, you might consider using a queue in front of your SolrCloud instead of just starting new requests, in order to set an effective limit on heap usage. - Toke Eskildsen, State and University Library, Denmark
Re: SolrCloud OOM Problem
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Doesn't this mean I am starting with 5G and never going over 5G? I've seen a few of those univerted multi-valued field OOMs already on the upgraded host. Thanks Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152585.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
On 8/12/2014 3:12 PM, tuxedomoon wrote: I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Doesn't this mean I am starting with 5G and never going over 5G? Yes, that's exactly what it means -- you have a heap size limit of 5GB. The OutOfMemory error indicates that Solr needs more heap space than it is getting. You'll need to raise the -Xmx value. it is usually advisable to configure -Xms to match. The wiki page I linked before includes a link to the following page, listing the GC options that I use beyond the -Xmx setting: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn
Re: SolrCloud OOM Problem
On 8/11/2014 5:27 PM, dancoleman wrote: My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes documents: 120M total zookeeper: 5 external t1.micros snip Linux top command output with no indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8654 root 20 0 95.3g 6.4g 1.1g S 27.6 87.4 83:46.19 java Linux top command output with indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 12499 root 20 0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java I think you're likely going to need a much larger heap than 5GB, or you're going to need a lot more machines and shards, so that each machine has a much smaller piece of the index. The java heap is only one part of the story here, though. Solr performance is terrible when the OS cannot effectively cache the index, because Solr must actually read the disk to get the data required for a query. Disks are incredibly SLOW. Even SSD storage is a *lot* slower than RAM. Your setup does not have anywhere near enough memory for the size of your shards. Amazon's website says that the m1.large instance has 7.5GB of RAM. You're allocating 5GB of that to Solr (the java heap) according to your startup options. If you subtract a little more for the operating system and basic system services, that leaves about 2GB of RAM for the disk cache. Based on the numbers from top, that Solr instance is handling nearly 90GB of index. 2GB of RAM for caching is nowhere near enough -- you will want between 32GB and 96GB of total RAM for that much index. http://wiki.apache.org/solr/SolrPerformanceProblems#RAM Thanks, Shawn
Re: SolrCloud OOM Problem
90G is correct, each host is currently holding that much data. Are you saying that 32GB to 96GB would be needed for each host? Assuming we did not add more shards that is. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
90G is correct, each host is currently holding that much data. Are you saying that 32GB to 96GB would be needed for each host? Assuming we did not add more shards that is. If you want good performance and enough memory to give Solr the heap it will need, yes. Lucene (the search API that Solr uses) relies on good operating system caching for the index. Having enough memory to catch the ENTIRE index is not usually required, but it is recommended. Alternatively, you can add a lot more hosts and create a new collection with a lot more shards. The total memory requirement across the whole cloud won't go down, but each host won't require as much. Thanks, Shawn