Strange "spikes" in query response times...any ideas where else to look?

solr Thu, 28 Jun 2012 16:56:57 -0700

Greetings all,

We are working on building up a large Solr index for over 300 millionrecords...and this is our first look at Solr. We are currently runninga set of unique search queries against a single server (so noreplication, no indexing going on at the same time, and no distributedsearch) with a set number of records (in our case, 10 million recordsin the index) for about 30 minutes, with nearly all of our searchesbeing unique (I say "nearly" because our set of queries is unique, butI have not yet confirmed that JMeter is selecting these queries withno replacement).

We are striving for a 2 second response time on the average, andindeed we are pretty darned close. In fact, if you look at the averageresponses time, we are well under the 2 seconds per query.Unfortunately, we are seeing that about once every 6 minutes or so(and it is not a regular event...exactly six minutes apart...it is"about" six minutes but it fluctuates) we get a single query thatreturns in something like 15 to 20 seconds

We have been trying to identify what is causing this "spike" every sooften and we are completely baffled. What we have done thus far:

1) Looked through the SAR logs and have not seen anything thatcorrelates to this issue2) Tracked the JVM statistics...especially the garbagecollections...no correlations there either

3) Examined the queries...no pattern obvious there

4) Played with the JVM memory settings (heap settings, cache settings,and any other settings we could find)5) Changed hardware: Brand new 4 processor, 8 gig RAM server with afresh install of Redhat 5.7 enterprise, tried on a large instance ofAWS EC2, tried on a fresh instance of a VMWare based virtual machinefrom our own data center) an still nothing is giving us a clue as towhat is causing these "spikes"

5) No correlation found between the number of hits returned and the spikes

Our data is very simple and so are the queries. The schema consists of40 fields, most of which are "string" fields, 2 of which are"location" fields, and a small handful of which are integer fields.All fields are indexed and all fields are stored.

Our queries are also rather simple. Many of the queries are a simpleone-field search. The most complex query we have is a 3-field search.Again, no correlation has been established between the query and thesespikes. Also, about 60% of our queries return zero hits (on theassumption that we want to make solr search its entire index every sooften. 60% is more than we intended and we will fix that soon...butthat is what is currently happening. Again, no correlation foundbetween spikes and 0-hit returned queries).

For some time we were testing with 100 million records in the indexand the aggregate data looked quite good. Most queries were returningin under 2 seconds. Unfortunately, it was when we looked at theindividual data points that we found spikes every 6-8 minutes or sohitting sometimes as high as 150 seconds!

We have been testing with 100 million records in the index, 50 millionrecords in the index, 25 million, 20 million, 15 million, and 10million records. As I indicated at the start, we are now at 10million records with 15-20 seconds spikes.

As we have decreased the number of records in the index,the size (butnot the frequency) of the spikes has been dropping.

My question is: Is this type of behavior normal for Solr when it isbeing overstressed? I've read of lots of people with far morecomplicated schemas running MORE than 10 million records in an indexand never once complained about these spikes. Since I am new at this,I am not sure what Solr's "failure mode" looks like when it has toomany records to search.

I am hoping someone looking at this note can at least give me anotherdirection to look. 10 million records searched in less than 2 secondsmost of the time is great...but those 10 and 20 seconds spikes are notgoing to go over well with our customers...and I somehow think thereis more we should be able to do here.


Thanks.

Peter S. Lee
ProQuest

Strange "spikes" in query response times...any ideas where else to look?

Reply via email to