Re: How large is your solr index?

Joseph Obernberger Thu, 08 Jan 2015 08:40:10 -0800


On 1/8/2015 3:16 AM, Toke Eskildsen wrote:

On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote:

Thank you Toke - yes - the data is indexed throughout the day.  We are
handling very few searches - probably 50 a day; this is an R&D system.

If your searches are in small bundles, you could pause the indexing flow
while the searches are executed, for better performance.

Our HDFS cache, I believe, is too small at 10GBytes per shard.

That depends a lot on your corpus, your searches and underlying storage.
But with our current level of information, it is a really good bet:
Having 10GB cache per 130GB (270GB?) data is not a lot with spinning
drives.

Yes - it would be 20GBytes of cache per 270GBytes of data.

Current parameters for running each shard are:
JAVA_OPTS="-XX:MaxDirectMemorySize=10g -XX:+UseLargePages -XX:NewRatio=3

[...]

-Xmx10752m"

One Solr/shard? You could probably win a bit by having one Solr/machine
instead. Anyway, it's quite a high Xmx, but I presume you have measured
the memory needs.

We've tried lower Xmx but we get OOM errors during faceting of largedatasets. Right now we're running two JVMs per physical box (2 shardsper box), but we're going to be changing that to on JVM and one shardper box.

I'd love to try SSDs, but don't have the budget at present to go that
route.

We find the price/performance for SSD + moderate RAM to be quite a
better deal than spinning drives + a lot of RAM, even when buying
enterprise hardware. For consumer SSDs (used in our large server) it is
even cheaper to use SSDs. It all depends on use pattern of course, but
your setup with non-concurrent searches seems like it would fit well.

Note: I am sure that the RAM == index size would deliver very high
performance. With enough RAM you can use tape to hold the index. Whether
it is cost effective is another matter.

Ha! Yes - our index is accessible via a 2400 baud modem, but we havelots of cache! ;)

I'd really like to get the HDFS option to work well as it
reduces system complexity.

That is very understandable. We examined the option of networked storage
(Isilon) with underlying spindles, and it performed adequately for our
needs up to 2-3TB of index data. Unfortunately the heavy random read
load from Solr meant a noticeable degradation of other services using
the networked storage. I am sure it could be solved with more
centralized hardware, but in the end we found it cheaper and simpler to
use local storage for search. This will of course differ across
organizations and setups.

We're going to experiment with the one shard per box and more RAM cacheper shard and see where that gets us; we'll also be adding more shards.

Thanks for the tips!

Interesting that you mention Isilon as we're planning on doing an evalwith their product this year where we'll be testing out their HDFSlayer. It's a potential way to balance computer and storage since youcan add HDFS storage without adding compute.


- Toke Eskildsen

-Joe

Re: How large is your solr index?

Reply via email to