Greetings all,

We are working on building up a large Solr index for over 300 million records...and this is our first look at Solr. We are currently running a set of unique search queries against a single server (so no replication, no indexing going on at the same time, and no distributed search) with a set number of records (in our case, 10 million records in the index) for about 30 minutes, with nearly all of our searches being unique (I say "nearly" because our set of queries is unique, but I have not yet confirmed that JMeter is selecting these queries with no replacement).

We are striving for a 2 second response time on the average, and indeed we are pretty darned close. In fact, if you look at the average responses time, we are well under the 2 seconds per query. Unfortunately, we are seeing that about once every 6 minutes or so (and it is not a regular event...exactly six minutes apart...it is "about" six minutes but it fluctuates) we get a single query that returns in something like 15 to 20 seconds

We have been trying to identify what is causing this "spike" every so often and we are completely baffled. What we have done thus far:

1) Looked through the SAR logs and have not seen anything that correlates to this issue 2) Tracked the JVM statistics...especially the garbage collections...no correlations there either
3) Examined the queries...no pattern obvious there
4) Played with the JVM memory settings (heap settings, cache settings, and any other settings we could find) 5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a fresh install of Redhat 5.7 enterprise, tried on a large instance of AWS EC2, tried on a fresh instance of a VMWare based virtual machine from our own data center) an still nothing is giving us a clue as to what is causing these "spikes"
5) No correlation found between the number of hits returned and the spikes


Our data is very simple and so are the queries. The schema consists of 40 fields, most of which are "string" fields, 2 of which are "location" fields, and a small handful of which are integer fields. All fields are indexed and all fields are stored.

Our queries are also rather simple. Many of the queries are a simple one-field search. The most complex query we have is a 3-field search. Again, no correlation has been established between the query and these spikes. Also, about 60% of our queries return zero hits (on the assumption that we want to make solr search its entire index every so often. 60% is more than we intended and we will fix that soon...but that is what is currently happening. Again, no correlation found between spikes and 0-hit returned queries).

For some time we were testing with 100 million records in the index and the aggregate data looked quite good. Most queries were returning in under 2 seconds. Unfortunately, it was when we looked at the individual data points that we found spikes every 6-8 minutes or so hitting sometimes as high as 150 seconds!

We have been testing with 100 million records in the index, 50 million records in the index, 25 million, 20 million, 15 million, and 10 million records. As I indicated at the start, we are now at 10 million records with 15-20 seconds spikes.

As we have decreased the number of records in the index,the size (but not the frequency) of the spikes has been dropping.

My question is: Is this type of behavior normal for Solr when it is being overstressed? I've read of lots of people with far more complicated schemas running MORE than 10 million records in an index and never once complained about these spikes. Since I am new at this, I am not sure what Solr's "failure mode" looks like when it has too many records to search.

I am hoping someone looking at this note can at least give me another direction to look. 10 million records searched in less than 2 seconds most of the time is great...but those 10 and 20 seconds spikes are not going to go over well with our customers...and I somehow think there is more we should be able to do here.

Thanks.

Peter S. Lee
ProQuest

Reply via email to