On 8/11/2014 5:27 PM, dancoleman wrote: > My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are > some specs on my setup: > > hosts: all are EC2 m1.large with 250G data volumes > documents: 120M total > zookeeper: 5 external t1.micros
<snip> > Linux "top" command output with no indexing > ======================================================= > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 8654 root 20 0 95.3g 6.4g 1.1g S 27.6 87.4 83:46.19 java > > > Linux "top" command output with indexing > ======================================================= > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 12499 root 20 0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java I think you're likely going to need a much larger heap than 5GB, or you're going to need a lot more machines and shards, so that each machine has a much smaller piece of the index. The java heap is only one part of the story here, though. Solr performance is terrible when the OS cannot effectively cache the index, because Solr must actually read the disk to get the data required for a query. Disks are incredibly SLOW. Even SSD storage is a *lot* slower than RAM. Your setup does not have anywhere near enough memory for the size of your shards. Amazon's website says that the m1.large instance has 7.5GB of RAM. You're allocating 5GB of that to Solr (the java heap) according to your startup options. If you subtract a little more for the operating system and basic system services, that leaves about 2GB of RAM for the disk cache. Based on the numbers from top, that Solr instance is handling nearly 90GB of index. 2GB of RAM for caching is nowhere near enough -- you will want between 32GB and 96GB of total RAM for that much index. http://wiki.apache.org/solr/SolrPerformanceProblems#RAM Thanks, Shawn