I've noticed the following messages in the infostream log around the times the pauses begin...
DW 0 [Tue Aug 28 13:25:29 UTC 2012; qtp435584308-969]: WARNING DocumentsWriter has stalled threads; waiting ________________________________________ From: Voth, Brad (GE Corporate) Sent: Monday, August 27, 2012 4:54 PM To: solr-user@lucene.apache.org Subject: Injest pauses Hello all, I'm working on implementing a solution in a very high index rate, lower query rate project. We have a very small record size, 2 small strings, 6 longs, 7 ints, and 2 dates, indexing on 8 of those fields. We need to be able to maintain an injest rate of 50k records/sec average. Through sharding and a reasonable sized cluster we've hit most of our performance goals, but have found that our producers tend to get hung on a shard that is doing a merge. I've done a bit of digging and found some tips and hints on configuring the merging, but have yet to get rid of the issue. What we see on the server hosting the shard during a problematic period is a single cpu core at 100%, and very little IO activity on the disk and merge messages in the logs. This leads me to believe that a single merge thread is blocking indexing from occuring. When this happens our producers, which distribute their updates amongst all the shards, pile up on this shard and wait. This causes our overall injest rate to plummet and is currently keeping us from going to prod with the solution. The relevant portion of our solrconfig.xml is: <indexConfig> .... <ramBufferSizeMB>1024</ramBufferSizeMB> <mergeFactor>20</mergeFactor> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">8</int> <int name="segmentsPerTier">20</int> </mergePolicy> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxMergeCount">10</int> <int name="maxThreadCount">10</int> </mergeScheduler> <autoCommit> <maxTime>1500000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>10000</maxTime> </autoSoftCommit> .... </indexConfig> Most of the settings above are the result of many trial runs and minimal difference between each change. Any thoughts? Brad Voth