Re: SolrCloud performance in VM environment

Shawn Heisey Mon, 21 Oct 2013 10:41:51 -0700

On 10/21/2013 9:48 AM, Tom Mortimer wrote:

Hi everyone,


I've been working on an installation recently which uses SolrCloud to index
45M documents into 8 shards on 2 VMs running 64-bit Ubuntu (with another 2
identical VMs set up for replicas). The reason we're using so many shards
for a relatively small index is that there are complex filtering
requirements at search time, to restrict users to items they are licensed
to view. Initial tests demonstrated that multiple shards would be required.

The total size of the index is about 140GB, and each VM has 16GB RAM (32GB
total) and 4 CPU units. I know this is far under what would normally be
recommended for an index of this size, and I'm working on persuading the
customer to increase the RAM (basically, telling them it won't work
otherwise.) Performance is currently pretty poor and I would expect more
RAM to improve things. However, there are a couple of other oddities which
concern me,

Running multiple shards like you are, where each operating system ishandling more than one shard, is only going to perform better if yourquery volume is low and you have lots of CPU cores. If your queryvolume is high or you only have 2-4 CPU cores on each VM, you might bebetter off with fewer shards or not sharded at all.

The way that I read this is that you've got two physical machines with32GB RAM, each running two VMs that have 16GB. Each VM houses 4 shards,or 70GB of index.

There's a scenario that might be better if all of the following aretrue: 1) I'm right about how your hardware is provisioned. 2) You orthe client owns the hardware. 3) You have an extremely low-end thirdmachine available - single CPU with 1GB of RAM would probably beenough. In this scenario, you run one Solr instance and one zookeeperinstance on each of your two "big" machines, and use the third wimpymachine as a third zookeeper node. No virtualization. For the rest ofmy reply, I'm assuming that you haven't taken this step, but it willprobably apply either way.

The first is that I've been reindexing a fixed set of 500 docs to test
indexing and commit performance (with soft commits within 60s). The time
taken to complete a hard commit after this is longer than I'd expect, and
highly variable - from 10s to 70s. This makes me wonder whether the SAN
(which provides all the storage for these VMs and the customers several
other VMs) is being saturated periodically. I grabbed some iostat output on
different occasions to (possibly) show the variability:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              64.50         0.00      2476.00          0       4952
...
sdb               8.90         0.00       348.00          0       6960
...
sdb               1.15         0.00        43.20          0        864

There are two likely possibilities for this. One or both of them mightbe in play. 1) Because the OS disk cache is small, not much of theindex can be cached. This can result in a lot of disk I/O for a commit,slowing things way down. Increasing the size of the OS disk cache isreally the only solution for that. 2) Cache autowarming, particularlythe filter cache. In the cache statistics, you can see how long eachcache took to warm up after the last searcher was opened. The solutionfor that is to reduce the autowarmCount values.

The other thing that confuses me is that after a Solr restart or hard
commit, search times average about 1.2s under light load. After searching
the same set of queries for 5-6 iterations this improves to 0.1s. However,
in either case - cold or warm - iostat reports no device reads at all:

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb               0.40         0.00         8.00          0        160
...
sdb               0.30         0.00        10.40          0        104

(the writes are due to logging). This implies to me that the 'hot' blocks
are being completely cached in RAM - so why the variation in search time
and the number of iterations required to speed it up?

Linux is pretty good about making limited OS disk cache resources work.Sounds like the caching is working reasonably well for queries. Itmight not be working so well for updates or commits, though.

Running multiple Solr JVMs per machine, virtual or not, causes moreproblems than it solves. Solr has no limits on the number of cores(shard replicas) per instance, assuming there are enough systemresources. There should be exactly one Solr JVM per operating system.Running more than one results in quite a lot of overhead, and yourmemory is precious. When you create a collection, you can give thecollections API the "maxShardsPerNode" parameter to create more than oneshard per instance.

I don't have a great deal of experience in low-level performance tuning, so
please forgive any naivety. Any ideas of what to do next would be greatly
appreciated. I don't currently have details of the VM implementation but
can get hold of this if it's relevant.

I don't think the virtualization details matter all that much. Pleasefeel free to ask questions or supply more info based on what I've told you.


Thanks,
Shawn

Re: SolrCloud performance in VM environment

Reply via email to