Hi Shawn; You say that:
*... your documents are about 50KB each. That would translate to an index that's at least 25GB* I know we can not say an exact size but what is the approximately ratio of document size / index size according to your experiences? 2013/4/9 Shawn Heisey <s...@elyograg.org> > On 4/9/2013 2:10 PM, Manuel Le Normand wrote: > >> Thanks for replying. >> My config: >> >> - 40 dedicated servers, dual-core each >> - Running Tomcat servlet on Linux >> - 12 Gb RAM per server, splitted half between OS and Solr >> - Complex queries (up to 30 conditions on different fields), 1 qps >> rate >> >> Sharding my index was done for two reasons, based on 2 servers (4shards) >> tests: >> >> 1. As index grew above few million of docs qTime raised greatly, while >> sharding the index to smaller pieces (about 0.5M docs) gave way better >> results, so I bound every shard to have 0.5M docs. >> 2. Tests showed i was cpu-bounded during queries. As i have low qps >> rate >> (emphasize: lower than expected qTime) and as a query runs >> single-threaded >> on each shard, it made sense to accord a cpu to each shard. >> >> For the same amount of docs per shards I do expect a raise in total qTime >> for the reasons: >> >> 1. The response should wait for the slowest shard >> 2. Merging the responses from 40 different shards takes time >> >> What i understand from your explanation is that it's the merging that >> takes >> time and as qTime ends only after the second retrieval phase, the qTime on >> each shard will take longer. Meaning during a significant proportion of >> the >> first query phase (right after the [id,score] are retieved), all cpu's are >> idle except the response-merger thread running on a single cpu. I thought >> of the merge as a simple sorting of [id,score], way more simple than >> additional 300 ms cpu time. >> >> Why would a RAM increase improve my performances, as it's a >> "response-merge" (CPU resource) bottleneck? >> > > If you have not tweaked the Tomcat configuration, that can lead to > problems, but if your total query volume is really only one query per > second, this is probably not a worry for you. A tomcat connector can be > configured with a maxThreads parameter. The recommended value there is > 10000, but Tomcat defaults to 200. > > You didn't include the index sizes. There's half a million docs per > shard, but I don't know what that translates to in terms of MB or GB of > disk space. > > On another email thread you mention that your documents are about 50KB > each. That would translate to an index that's at least 25GB, possibly > more. That email thread also says that optimization for you takes an hour, > further indications that you've got some really big indexes. > > You're saying that you have given 6GB out of the 12GB to Solr, leaving > only 6GB for the OS and caching. Ideally you want to have enough RAM to > cache the entire index, but in reality you can usually get away with > caching between half and two thirds of the index. Exactly what ratio works > best is highly dependent on your schema. > > If my numbers are even close to right, then you've got a lot more index on > each server than available RAM. Based on what I can deduce, you would want > 24 to 48GB of RAM per server. If my numbers are wrong, then this estimate > is wrong. > > I would be interested in seeing your queries. If the complexity can be > expressed as filter queries that get re-used a lot, the filter cache can be > a major boost to performance. Solr's caches in general can make a big > difference. There is no guarantee that caches will help, of course. > > Thanks, > Shawn > >