RE: Strange "spikes" in query response times...any ideas where else to look?

solr Thu, 28 Jun 2012 18:20:34 -0700

Michael,

Thank you for responding...and for the excellent questions.

1) We have never seen this response time spike with a user-interactivesearch. However, in the span of about 40 minutes, which included about82,000 queries, we only saw a handful of near-equally distributed"spikes". We have tried sending queries from the admin tool while thetest was running, but given those odds, I'm not surprised we've never"hit" on one of those few spikes we are seeing in the test results.

2) Good point and I should have mentioned this. We are using multiplemethods to track these response times.a) Looking at the catalina.out file and plotting the response timesrecorded there (I think this is logging the QTime as seen by Solr).b) Looking at what JMeter is reporting as response times. Ingeneral, these are very close if not identical to what is being seenin the Catalina.out file. I have not run a line-by-line comparison,but putting the query response graphs next to each other shows them tobe nearly (or possibly exactly) the same. Nothing looked out of theordinary.

3) We are using multiple threads. Before your email I was looking atthe results, doing some math, and double checking the reports fromJMeter. I did notice that our throughput is much higher than we meantfor it to be. JMeter is set up to run 15 threads from a single testmachine...but I noticed that the JMeter report is showing close to 47queries per second. We are only targeting TWO to FIVE queries persecond. This is up next on our list of things to look at and how tocontrol more effectively. We do have three separate machines set upfor JMeter testing and we are investigating to see if perhaps allthree of these machines are inadvertently being launched during thetest at one time and overwhelming the server. This *might* be onefacet of the problem. Agreed on that.

Even as we investigate this last item regarding the number ofusers/threads, I wouldn't mind any other thoughts you or anyone elsehad to offer. We are checking on this user/threads issue and for thesake of anyone else you finds this discussion useful I'll note what wefind.


Thanks again.

Peter S. Lee
ProQuest

Quoting Michael Ryan <mr...@moreover.com>:

A few questions...

1) Do you only see these spikes when running JMeter? I.e., do youever see a spike when you manually run a query?

2) How are you measuring the response time? In my experience thereare three different ways to measure query speed. Usually all of themwill be approximately equal, but in some situations they can bequite different, and this difference can be a clue as to where thebottleneck is:

  1) The response time as seen by the end user (in this case, JMeter)

2) The response time as seen by the container (for example, inJetty you can get this by enabling logLatency in jetty.xml)

  3) The "QTime" as returned in the Solr response

3) Are you running multiple queries concurrently, or are you justusing a single thread in JMeter?


-Michael

-----Original Message-----
From: s...@isshomefront.com [mailto:s...@isshomefront.com]
Sent: Thursday, June 28, 2012 7:56 PM
To: solr-user@lucene.apache.org

Subject: Strange "spikes" in query response times...any ideas whereelse to look?


Greetings all,

We are working on building up a large Solr index for over 300 million
records...and this is our first look at Solr. We are currently running
a set of unique search queries against a single server (so no
replication, no indexing going on at the same time, and no distributed
search) with a set number of records (in our case, 10 million records
in the index) for about 30 minutes, with nearly all of our searches
being unique (I say "nearly" because our set of queries is unique, but
I have not yet confirmed that JMeter is selecting these queries with
no replacement).

We are striving for a 2 second response time on the average, and
indeed we are pretty darned close. In fact, if you look at the average
responses time, we are well under the 2 seconds per query.
Unfortunately, we are seeing that about once every 6 minutes or so
(and it is not a regular event...exactly six minutes apart...it is
"about" six minutes but it fluctuates) we get a single query that
returns in something like 15 to 20 seconds

We have been trying to identify what is causing this "spike" every so
often and we are completely baffled. What we have done thus far:

1) Looked through the SAR logs and have not seen anything that
correlates to this issue
2) Tracked the JVM statistics...especially the garbage
collections...no correlations there either
3) Examined the queries...no pattern obvious there
4) Played with the JVM memory settings (heap settings, cache settings,
and any other settings we could find)
5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a
fresh install of Redhat 5.7 enterprise, tried on a large instance of
AWS EC2, tried on a fresh instance of a VMWare based virtual machine
from our own data center) an still nothing is giving us a clue as to
what is causing these "spikes"
5) No correlation found between the number of hits returned and the spikes


Our data is very simple and so are the queries. The schema consists of
40 fields, most of which are "string" fields, 2 of which are
"location" fields, and a small handful of which are integer fields.
All fields are indexed and all fields are stored.

Our queries are also rather simple. Many of the queries are a simple
one-field search. The most complex query we have is a 3-field search.
Again, no correlation has been established between the query and these
spikes. Also, about 60% of our queries return zero hits (on the
assumption that we want to make solr search its entire index every so
often. 60% is more than we intended and we will fix that soon...but
that is what is currently happening. Again, no correlation found
between spikes and 0-hit returned queries).

For some time we were testing with 100 million records in the index
and the aggregate data looked quite good. Most queries were returning
in under 2 seconds. Unfortunately, it was when we looked at the
individual data points that we found spikes every 6-8 minutes or so
hitting sometimes as high as 150 seconds!

We have been testing with 100 million records in the index, 50 million
records in the index, 25 million, 20 million, 15 million, and 10
million records. As I  indicated at the start, we are now at 10
million records with 15-20 seconds spikes.

As we have decreased the number of records in the index,the size (but
not the frequency) of the spikes has been dropping.

My question is: Is this type of behavior normal for Solr when it is
being overstressed? I've read of lots of people with far more
complicated schemas running MORE than 10 million records in an index
and never once complained about these spikes. Since I am new at this,
I am not sure what Solr's "failure mode" looks like when it has too
many records to search.

I am hoping someone looking at this note can at least give me another
direction to look. 10 million records searched in less than 2 seconds
most of the time is great...but those 10 and 20 seconds spikes are not
going to go over well with our customers...and I somehow think there
is more we should be able to do here.

Thanks.

Peter S. Lee
ProQuest

RE: Strange "spikes" in query response times...any ideas where else to look?

Reply via email to