Thanks to you and Otis for the suggestions!  Some more information:

- Based on the Solr stats page, my caches seem to be working pretty well (few 
or no evictions, hit rates in the 75-80% range).
- VuFind is actually doing two Solr queries per search (one initial search 
followed by a supplemental spell check search -- I believe this is necessary 
because VuFind has two separate spelling indexes, one for shingled terms and 
one for single words).  That is probably exaggerating the problem, though based 
on searches with debugQuery on, it looks like it's always the initial search 
(rather than the supplemental spelling search) that's consuming the bulk of the 
time.
- enableLazyFieldLoading is set to true.
- I'm retrieving 20 documents per page.
- My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log 
-Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5

It appears that a large portion of my problem had to do with autowarming, a 
topic that I've never had a strong grasp on, though perhaps I'm finally 
learning (any recommended primer links would be welcome!).  I did have some 
autowarming settings in solrconfig.xml (an arbitrary search for a bunch of 
random keywords in the newSearcher and firstSearcher events, plus autowarmCount 
settings on all of my caches).  However, when I looked at the debugQuery 
output, I noticed that a huge amount of time was being wasted loading facets on 
the first search after restarting Solr, so I changed my newSearcher and 
firstSearcher events to this:

      <arr name="queries">
        <lst>
          <str name="q">*:*</str>
          <str name="start">0</str>
          <str name="rows">10</str>
          <str name="facet">true</str>
          <str name="facet.mincount">1</str>
          <str name="facet.field">collection</str>
          <str name="facet.field">format</str>
          <str name="facet.field">publishDate</str>
          <str name="facet.field">callnumber-first</str>
          <str name="facet.field">topic_facet</str>
          <str name="facet.field">authorStr</str>
          <str name="facet.field">language</str>
          <str name="facet.field">genre_facet</str>
          <str name="facet.field">era_facet</str>
          <str name="facet.field">geographic_facet</str>
        </lst>
      </arr>

Overall performance has now increased dramatically, and now the biggest 
bottleneck in the debug output seems to be the shingle spell checking!

Any other suggestions are welcome, since I suspect there's still room to 
squeeze more performance out of the system, and I'm still not sure I'm making 
the most of autowarming...  but this seems like a big step in the right 
direction.  Thanks again for the help!

- Demian

> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, June 03, 2011 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr performance tuning - disk i/o?
> 
> This doesn't seem right. Here's a couple of things to try:
> 1> attach &debugQuery=on to your long-running queries. The QTime
> returned
>      is the time taken to search, NOT including the time to load the
> docs. That'll
>      help pinpoint whether the problem is the search itself, or
> assembling the
>      documents.
> 2> Are you autowarming? If so, be sure it's actually done before
> querying.
> 3> Measure queries after the first few, particularly if you're sorting
> or
>      faceting.
> 4> What are your JVM settings? How much memory do you have?
> 5> is <enableLazyFieldLoading> set to true in your solrconfig.xml?
> 6> How many docs are you returning?
> 
> 
> There's more, but that'll do for a start.... Let us know if you gather
> more data
> and it's still slow.
> 
> Best
> Erick
> 
> On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz <demian.k...@villanova.edu>
> wrote:
> > Hello,
> >
> > I'm trying to move a VuFind installation from an ailing physical
> server into a virtualized environment, and I'm running into performance
> problems.  VuFind is a Solr 1.4.1-based application with fairly large
> and complex records (many stored fields, many words per record).  My
> particular installation contains about a million records in the index,
> with a total index size around 6GB.
> >
> > The virtual environment has more RAM and better CPUs than the old
> physical box, and I am satisfied that my Java environment is well-
> tuned.  My index is optimized.  Searches that hit the cache respond
> very well.  The problem is that non-cached searches are very slow - the
> more keywords I add, the slower they get, to the point of taking 6-12
> seconds to come back with results on a quiet box and well over a minute
> under stress testing.  (The old box still took a while for equivalent
> searches, but it was about twice as fast as the new one).
> >
> > My gut feeling is that disk access reading the index is the
> bottleneck here, but I know little about the specifics of Solr's
> internals, so it's entirely possible that my gut is wrong.  Outside
> testing does show that the the virtual environment's disk performance
> is not as good as the old physical server, especially when multiple
> processes are trying to access the same file simultaneously.
> >
> > So, two basic questions:
> >
> >
> > 1.)    Would you agree that I'm dealing with a disk bottleneck, or
> are there some other factors I should be considering?  Any good
> diagnostics I should be looking at?
> >
> > 2.)    If the problem is disk access, is there anything I can tune on
> the Solr side to alleviate the problems?
> >
> > Thanks,
> > Demian
> >

Reply via email to