Looks like I've opened up a very interesting can of worms.... Thank you to all that are posting to this thread, I'm learning a lot... The way I see it now... a Single Solr instance on this machine, seems like the most intelligent choice. And then as upgrade path, adding in-expensive machines. This adds me storage space, cpu power and starts to build up on the parallezation cluster and load balancing.
Greetz On Sat, Mar 17, 2018 at 11:23 AM, Deepak Goel <deic...@gmail.com> wrote: > On 17 Mar 2018 05:19, "Walter Underwood" <wun...@wunderwood.org> wrote: > > > On Mar 16, 2018, at 3:26 PM, Deepak Goel <deic...@gmail.com> wrote: > > > > Can you please post results of your test? > > > > Please tell us the tps at 25%, 50%, 75%, 100% of your CPU resource > > > I could, but it probably would not be useful for your documents or your > queries. > > We have 22 million homework problems. Our queries are often hundreds of > words long, > because students copy and paste entire problems. After pre-processing, the > average query > is still 25 words. > > For load benchmarking, I use access logs from production. I typically > gather over a half-million > lines of log. Using production logs means that queries have the same > statistical distribution > as prod, so the cache hit rates are reasonable. > > Before each benchmark, I restart all the Solr instances to clear the > caches. Then the first part > of the query log is used to warm the caches, typically about 4000 queries. > > After that, the measured benchmark run starts. This uses JMeter with > 100-500 threads. Each > thread is configured with a constant throughput timer so a constant load is > offered. Test run > one or two hours. Recently, I ran a test with a rate of 1000 > requests/minute for one hour. > > During the benchmark, I monitor the CPU usage. Our systems are configured > with enough RAM > so that disk is not accessed for search indexes. If the CPU goes over > 75-80%, there is congestion > and queries will slow down. Also, if the run queue (load average) increases > over the number of > CPUs, there will be congestion. > > After the benchmark run, the JMeter log is analyzed to report response time > percentiles for > each Solr request handler. > > > Sorry for being rude. But the ' results ' please, not the ' road to the > results ' > > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) >