Some observations: *#* The CPU Load on idxa1 never crosses above 91% mark mostly even if you increase the load (by increasing the number of threads). This is similar to my environment (I can never cross 90% on Linux even if I increase the load. For Windows I can never cross 65% for some reason)
*#* Similarly the CPU Load on idxa2 never crosses 50% (I guess this follows from the above point) *#* Your system saturates at 10 threads (The qps hits the highest mark at this load). Increasing the load further (number of threads - 20, 100, 200) only worsens the response time, while the qps remains the same *#* The Query-Time is anywhere between 25-100ms. For 200 threads, the Query-Time is between 500-1400ms. This is for a load of 'Static-Query'. A 'Dynamic-Query' load would only worsen the Query-Time (It will also probably bring down the qps and max-cpu-utilisation) *#* The author has a similar hardware configuration as yours (idxa1). The author has not specified the OS though. If it is Windows, then I would believe it might be a good idea to have 2 VM's on his box If it is Linux, it might be a good idea to decide once someone does the test with Dynamic-Query Load. If the author has a load of Static-Query, then having one VM on his box should be fine as 90% of CPU resources can be consumed (However he would loose on Reliability, Availability as compared to 2 VM's) Some other points: *@* I would have liked to have the vmstat information for 10,5,7,8 threads *@* Also if you could run the test for 7 and 8 threads (Because at 10 threads system saturates and at 5 threads the load is less) *@* Can you please also do a Load-Test for Dynamic-Queries with 5-10 threads (I am sorry for asking too much. You can please ignore these demands if it is too much). I will do the same on my environment Deepak "Please stop cruelty to Animals, help by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" On Sun, Mar 25, 2018 at 9:45 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 3/25/2018 7:15 AM, Deepak Goel wrote: > >> $ Why is the 'qps' not increasing with increase in threads? (If I >> understand the qps parameter right?) >> > > Likely because I sent all these queries to a single copy of the index. We > only have two copies of the index in production, plus a third copy on a dev > server running a newer version of Solr. I sent the queries from the test > program to the production server pair that's designated "standby" -- not > receiving queries unless the other pair is down. > > Our Solr servers do not handle a high query load. It's usually less than > two queries per second. > > Handling a very high query load requires load balancing to multiple copies > of the index (replicas in SolrCloud terminology). We don't need that, so we > don't have a bunch of copies. The only reason we have two copies is so we > can handle hardware failure gracefully. I bypassed the load balancer for > these tests. > > $ Is it possible to run with 10 & 5 & 2 threads? >> > > Sure. > > I have updated the gist with those results. > > https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379 > > $ What were the server utilisation (CPU, Memory) when you ran the test? >> > > I actually never looked when I was running the tests before. I ran > additional tests so I could gather that data. The updated gist has vmstat > information (while running a 20 thread test, and while running a 200 thread > test) for the server side. The server named idxa1 has a higher CPU load > because it is aggregating the shard data and replying to the query, in > addition to serving three out of the seven shards. The server named idxa2 > has four shards. The extra shard on idxa2 is very small - a little over > 321000 docs, a little over 500MB disk used. This is where new docs are > written. > > The CPU load on idxa2 is similar for both thread levels. I this is > because all queries are served from cache. But idxa1 shows a higher load, > because even when the cache is used, that server must still aggregate the > shard data (which was pulled from cache) and create responses. The > aggregation is not cached, because Solr has no way to know that what it is > receiving from the shards is cached data. > > Here's the benchmark output from the 200 thread test when I was getting > the CPU information: > > query count: 200000 > elapsed count: 200000 > query median: 488.0 > elapsed median: 500.0 > query 75th: 674.0 > elapsed 75th: 686.0 > query 95th: 1006.0 > elapsed 95th: 1018.0 > query 99th: 1283.01 > elapsed 99th: 1299.0 > total time in seconds: 542 > numThreads: 200 > queries per thread: 1000 > qps: 369 > > $ The 'query median' increases from 35 to 470 as you increase threads from >> 20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 >> when >> you had hit it the second time around) >> > > When I got 11 ms, that was doing *one* query. This program does a lot of > them, so I'm not surprised by the increase. I did the one-off queries on > the dev server, not the standby production servers that received the load > test. The hardware specs are similar, except that in dev, the entire index > is on one server running Solr 6.6.2. That server also contains other > indexes not being handled by the production pair I used for the load test. > > $ Can you please give Linux server configuration if possible? >> > > What *exactly* are you looking for here? I've got some information below, > but I do not know if it's what you are after. > > High level, first server (idxa1): > Dell PowerEdge 2950 III > Two 4-core CPUs. > model name : Intel(R) Xeon(R) CPU E5440 @ 2.83GHz > 64GB memory > Solr is version 4.7.2, with an 8GB heap > About 140GB of index data > CentOS 6, kernel 2.6.32-431.11.2.el6.centos.plus.x86_64 > Oracla java: > java version "1.7.0_72" > Java(TM) SE Runtime Environment (build 1.7.0_72-b14) > Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode) > > Differences on the second server (idxa2): > model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz > Slightly more (about 500MB) index data. > 2.6.32-504.12.2.el6.centos.plus.x86_64. > > The whole production index is in the ballpark of 280GB, and contains over > 187 million docs. The dev server has more than 188 million docs. I think > the reason that the counts are different is because we very recently > deleted a bunch of data from the database, but skipped the update of the > Solr index for the deletion. The production indexes have been rebuilt > since the delete, but the dev index hasn't. > > The network between the client running the test and the Solr servers > includes a layer 3 switch, some layer 2 switches, and a firewall. All > network hardware is made by Cisco. The entire path (including the > firewall) is gigabit. > > Thanks, > Shawn > >