Hi, Will,

Have you investigated not using EBS volumes at all? I'm not sure what node
size you're using, but for example, you can build a RAID 0 out of the four
instance volumes on an m1.xlarge and get lots of disk bandwidth. Also,
there's some nice SSD instances available now. http://www.ec2instances.info/

That's assuming disk throughput is your problem. Have you tried using
iostat or top to discover what your iowait% is during these merges?


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>


On Thu, Jan 16, 2014 at 3:08 PM, Will Butler <w...@butlerhq.com> wrote:

> We currently have a SolrCloud cluster that contains two collections which
> we toggle between for querying and indexing. When bulk indexing to our
> “offline" collection, our query performance from the “online” collection
> suffers somewhat. When segment merges occur, it gets downright abysmal. We
> have adjusted several settings that affect flushing and/or merging and have
> tried increasing the IOPs capacity of our volumes, without much success.
> The best recommendation seems to be to simply have enough ram on each node
> for the index to fit into memory (plus additional memory which may be
> required for indexing). If this isn’t feasible, it seems that there is no
> way around the fact that flushes and merges will potentially take up IO
> resources needed for responding to queries. We are currently experimenting
> with throttling flushes and merges using maxWriteMBPerSec* settings, which
> seems to help if set to fairly low values. Does anyone have any other
> recommendations for optimizing SolrCloud to handle both heavy indexing and
> querying?
>
> Thanks,
>
> Will

Reply via email to