On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote: > Thank you Toke - yes - the data is indexed throughout the day. We are > handling very few searches - probably 50 a day; this is an R&D system.
If your searches are in small bundles, you could pause the indexing flow while the searches are executed, for better performance. > Our HDFS cache, I believe, is too small at 10GBytes per shard. That depends a lot on your corpus, your searches and underlying storage. But with our current level of information, it is a really good bet: Having 10GB cache per 130GB (270GB?) data is not a lot with spinning drives. > Current parameters for running each shard are: > JAVA_OPTS="-XX:MaxDirectMemorySize=10g -XX:+UseLargePages -XX:NewRatio=3 [...] > -Xmx10752m" One Solr/shard? You could probably win a bit by having one Solr/machine instead. Anyway, it's quite a high Xmx, but I presume you have measured the memory needs. > I'd love to try SSDs, but don't have the budget at present to go that > route. We find the price/performance for SSD + moderate RAM to be quite a better deal than spinning drives + a lot of RAM, even when buying enterprise hardware. For consumer SSDs (used in our large server) it is even cheaper to use SSDs. It all depends on use pattern of course, but your setup with non-concurrent searches seems like it would fit well. Note: I am sure that the RAM == index size would deliver very high performance. With enough RAM you can use tape to hold the index. Whether it is cost effective is another matter. > I'd really like to get the HDFS option to work well as it > reduces system complexity. That is very understandable. We examined the option of networked storage (Isilon) with underlying spindles, and it performed adequately for our needs up to 2-3TB of index data. Unfortunately the heavy random read load from Solr meant a noticeable degradation of other services using the networked storage. I am sure it could be solved with more centralized hardware, but in the end we found it cheaper and simpler to use local storage for search. This will of course differ across organizations and setups. - Toke Eskildsen