Hi, Will, Have you investigated not using EBS volumes at all? I'm not sure what node size you're using, but for example, you can build a RAID 0 out of the four instance volumes on an m1.xlarge and get lots of disk bandwidth. Also, there's some nice SSD instances available now. http://www.ec2instances.info/
That's assuming disk throughput is your problem. Have you tried using iostat or top to discover what your iowait% is during these merges? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions <https://twitter.com/Appinions> | g+: plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts> w: appinions.com <http://www.appinions.com/> On Thu, Jan 16, 2014 at 3:08 PM, Will Butler <w...@butlerhq.com> wrote: > We currently have a SolrCloud cluster that contains two collections which > we toggle between for querying and indexing. When bulk indexing to our > “offline" collection, our query performance from the “online” collection > suffers somewhat. When segment merges occur, it gets downright abysmal. We > have adjusted several settings that affect flushing and/or merging and have > tried increasing the IOPs capacity of our volumes, without much success. > The best recommendation seems to be to simply have enough ram on each node > for the index to fit into memory (plus additional memory which may be > required for indexing). If this isn’t feasible, it seems that there is no > way around the fact that flushes and merges will potentially take up IO > resources needed for responding to queries. We are currently experimenting > with throttling flushes and merges using maxWriteMBPerSec* settings, which > seems to help if set to fairly low values. Does anyone have any other > recommendations for optimizing SolrCloud to handle both heavy indexing and > querying? > > Thanks, > > Will