On 12/16/2015 9:08 PM, Erick Erickson wrote:
> Hmmm, take a look at the individual queries on a shard, i.e. peek at
> the Solr logs and see if the fq clause comes through cleanly when you
> see &distrib=false. I suspect this is just a glitch in assembling the
> debug response. If it is, it probably deserves a JIRA. In fact it
> deserves a JIRA in either case I think.
> 
> I don't see anything obvious, but your statement "when the caches are
> cold" points to autowarming as your culprit. What to you have set up
> for autowarming in your caches? And do you have any newSearcher or
> firstSearcher events defined?

Both the main query and the shard queries in the log look fine -- only
one copy of the filters is present.  It does look like a problem with
the debug info gathering.

There are no firstSearcher or newSearcher events.  Some time ago, I did
have the same query defined for each (*:* with a sort), but those have
been commented.

These are my cache definitions:

  <filterCache
    class="solr.FastLRUCache"
    size="64"
    initialSize="64"
    autowarmCount="4"
    cleanupThread="true"
    showitems="true"
  />

  <queryResultCache
    class="solr.FastLRUCache"
    size="512"
    initialSize="512"
    autowarmCount="4"
    cleanupThread="true"
  />

  <documentCache
    class="solr.FastLRUCache"
    size="16384"
    initialSize="4096"
    cleanupThread="true"
  />

On the dev server running a 5.3.2 snapshot, the last time a searcher was
opened on one of my large shards, filterCache took 4626ms to warm and
queryResultCache took 768ms.  Total warmup time for the searcher was 5394.

In production (4.9.1), the warmup times are worse.  One one of the
shards, total searcher warmup is 20943, filterCache is 6740, and
queryResultCache is 14202.  One difference in config -- autowarmCount on
queryResultCache is 8.

Because the autowarmCount values are low and still resulting in high
autowarm times, it's looking like a general performance issue.  I think
I may be having the problem I'm always telling other people about -- not
enough memory in the server for the OS disk cache.  There is about 150GB
of index data on each production server, 64GB of total RAM, and Solr has
an 8GB heap.  The index is always growing, so I think I may have hit one
of those thresholds where performance drops dramatically.

The production servers are maxed on memory and are each handling half
the large shards, so I think this simply means that I need more hardware
so that there is more total memory.  If I add another server to each
production copy, then each one will only need to handle a third of the
total index instead of half.  Each one will have about 100GB of index
data instead of 150GB.

I am also hoping that upgrading from 4.9.1 to the 5.3.2 snapshot will
increase performance.

Something I will try right now is bumping the heap to 9GB to see if
maybe there's heap starvation.  Based on the GC logs, I do not think
this is the problem.

Any other thoughts?

Thanks,
Shawn

Reply via email to