OK - I figured it out. It's not solr at all (and I'm not really surprised).
In the prototype benchmarks, we used a different instance of tomcat than we're
using for production load tests. Our prototype tomcat instance had no
maxThreads value set, so was using the default value of 200. The production
tomcat environment has a maxThreads value of 15 - we were just running out of
threads and getting connection refused exceptions thrown when we ramped up the
Solr hits past a certain level.
Thanks for considering, Yonik (and any others waiting to see any reply I
made)...
(As others have said - this listserv is great!)
Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Wednesday, June 29, 2011 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr just 'hangs' under load test - ideas?
Can you get a thread dump to see what is hanging?
-Yonik
http://www.lucidimagination.com
On Wed, Jun 29, 2011 at 11:45 AM, Bob Sandiford
bob.sandif...@sirsidynix.com wrote:
Hi, all.
I'm hoping someone has some thoughts here.
We're running Solr 3.1 (with the patch for SolrQueryParser.java to
not do the getLuceneVersion() calls, but use luceneMatchVersion
directly).
We're running in a Tomcat instance, 64 bit Java. CATALINA_OPTS are:
-Xmx7168m -Xms7168m -XX:MaxPermSize=256M
We're running 2 Solr cores, with the same schema.
We use SolrJ to run our searches from a Java app running in JBoss.
JBoss, Tomcat, and the Solr Index folders are all on the same server.
In case it's relevant, we're using JMeter as a load test harness.
We're running on Solaris, a 16 processor box with 48GB physical
memory.
I've run a successful load test at a 100 user load (at that rate
there are about 5-10 solr searches / second), and solr search responses
were coming in under 100ms.
When I tried to ramp up, as far as I can tell, Solr is just hanging.
(We have some logging statements around the SolrJ calls - just before,
we log how long our query construction takes, then we run the SolrJ
query and log the search times. We're getting a number of the query
construction logs, but no corresponding search time logs).
Symptoms:
The Tomcat and JBoss processes show as well under 1% CPU, and they
are still the top processes. CPU states show around 99% idle. RES
usage for the two Java processes around 3GB each. LWP under 120 for
each. STATE just shows as sleep. JBoss is still 'alive', as I can get
into a piece of software that talks to our JBoss app to get data.
We set things up to use log4j logging for Solr - the log isn't
showing any errors or exceptions.
We're not indexing - just searching.
Back in January, we did load testing on a prototype, and had no
problems (though that was Solr 1.4 at the time). It ramped up
beautifully - bottle necks were our apps, not Solr. What I'm
benchmarking now is a descendent of that prototyping - a bit more
complex on searches and more fields in the schema, but same basic
search logic as far as SolrJ usage.
Any ideas? What else to look at? Ringing any bells?
I can send more details if anyone wants specifics...
Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/