On 5/6/2015 1:58 AM, adfel70 wrote: > I have a cluster of 16 shards, 3 replicas. the cluster indexed nested > documents. > it currently has 3 billion documents overall (parent and children). > each shard has around 200 million docs. size of each shard is 250GB. > this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes. > each process has 28GB heap. each machine has 196GB RAM. > > I perform periodic indexing throughout the day. each indexing cycle adds > around 1.5 million docs. I keep the indexing load light - 2 processes with > bulks of 20 docs. > > My use case demands that each indexing cycle will be visible only when the > whole cycle finishes. > > I tried various methods of using soft and hard commits:
I personally would configure autoCommit on a five minute (maxTime of 300000) interval with openSearcher=false. The use case you have outlined (not seeing changed until the indexing is done) demands that you do NOT turn on autoSoftCommit, that you do one manual commit at the end of indexing, which could be either a soft commit or a hard commit. I would recommend a soft commit. Because it is the openSearcher part of a commit that's very expensive, you can successfully do autoCommit with openSearcher=false on an interval like 10 or 15 seconds and not see much in the way of immediate performance loss. That commit is still not free, not only in terms of resources, but in terms of java heap garbage generated. The general advice with commits is to do them as infrequently as you can, which applies to ANY commit, not just those that make changes visible. > with all methods I encounter pretty much the same problem: > 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit > opensearcher=true is performed. these GCs cause heavy latency (average > latency is 3 secs. latency during the problem is 80secs) > 2. if indexing cycles come too often, which causes softcommits or > hardcommits(opensearcher=true) occur with a small interval one after another > (around 5-10minutes), I start getting many OOM exceptions. If you're getting OOM, then either you need to change things so Solr requires less heap memory, or you need to increase the heap size. Changing things might be either the config or how you use Solr. Are you tuning your garbage collection? With a 28GB heap, tuning is not optional. It's so important that the startup scripts in 5.0 and 5.1 include it, even though the default max heap is 512MB. Let's do some quick math on your memory. You have four instances of Solr on each machine, each with a 28GB heap. That's 112GB of memory allocated to Java. With 196GB total, you have approximately 84GB of RAM left over for caching your index. A 16-shard index with three replicas means 48 cores. Divide that by 12 machines and that's 4 replicas on each server, presumably one in each Solr instance. You say that the size of each shard is 250GB, so you've got about a terabyte of index on each server, but only 84GB of RAM for caching. Even with SSD, that's not going to be anywhere near enough cache memory for good Solr performance. All these memory issues, including GC tuning, are discussed on this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems One additional note: By my calculations, each filterCache entry will be at least 23MB in size. This means that if you are using the filterCache and the G1 collector, you will not be able to avoid humongous allocations, which is any allocation larger than half the G1 region size. The max configurable G1 region size is 32MB. You should use the CMS collector for your GC tuning, not G1. If you can reduce the number of documents in each shard, G1 might work well. Thanks, Shawn