Re: Strange debug output for a slow query

Erick Erickson Thu, 17 Dec 2015 09:16:11 -0800

Yeah, if your warmup times are that long, then either you're
having lots of disk I/O contention or something. That said, you've
mentioned that after a while the queries are fine.


That indicates to me that you aren't autowarming _enough_ and
that your slow queries are not pre-loading parts of your index into
memory.

Of course you may be bottlenecking on memory or something else,
your idea of upping the memory would be the first thing I'd test. Although
you should be able to generate GC logs and if it is the case that you are
spending all your time GCing it should be clear from those logs.

And certainly you may simply have outgrown your hardware....

Best,
Erick

On Thu, Dec 17, 2015 at 8:49 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 12/16/2015 9:08 PM, Erick Erickson wrote:
>> Hmmm, take a look at the individual queries on a shard, i.e. peek at
>> the Solr logs and see if the fq clause comes through cleanly when you
>> see &distrib=false. I suspect this is just a glitch in assembling the
>> debug response. If it is, it probably deserves a JIRA. In fact it
>> deserves a JIRA in either case I think.
>>
>> I don't see anything obvious, but your statement "when the caches are
>> cold" points to autowarming as your culprit. What to you have set up
>> for autowarming in your caches? And do you have any newSearcher or
>> firstSearcher events defined?
>
> Both the main query and the shard queries in the log look fine -- only
> one copy of the filters is present.  It does look like a problem with
> the debug info gathering.
>
> There are no firstSearcher or newSearcher events.  Some time ago, I did
> have the same query defined for each (*:* with a sort), but those have
> been commented.
>
> These are my cache definitions:
>
>   <filterCache
>     class="solr.FastLRUCache"
>     size="64"
>     initialSize="64"
>     autowarmCount="4"
>     cleanupThread="true"
>     showitems="true"
>   />
>
>   <queryResultCache
>     class="solr.FastLRUCache"
>     size="512"
>     initialSize="512"
>     autowarmCount="4"
>     cleanupThread="true"
>   />
>
>   <documentCache
>     class="solr.FastLRUCache"
>     size="16384"
>     initialSize="4096"
>     cleanupThread="true"
>   />
>
> On the dev server running a 5.3.2 snapshot, the last time a searcher was
> opened on one of my large shards, filterCache took 4626ms to warm and
> queryResultCache took 768ms.  Total warmup time for the searcher was 5394.
>
> In production (4.9.1), the warmup times are worse.  One one of the
> shards, total searcher warmup is 20943, filterCache is 6740, and
> queryResultCache is 14202.  One difference in config -- autowarmCount on
> queryResultCache is 8.
>
> Because the autowarmCount values are low and still resulting in high
> autowarm times, it's looking like a general performance issue.  I think
> I may be having the problem I'm always telling other people about -- not
> enough memory in the server for the OS disk cache.  There is about 150GB
> of index data on each production server, 64GB of total RAM, and Solr has
> an 8GB heap.  The index is always growing, so I think I may have hit one
> of those thresholds where performance drops dramatically.
>
> The production servers are maxed on memory and are each handling half
> the large shards, so I think this simply means that I need more hardware
> so that there is more total memory.  If I add another server to each
> production copy, then each one will only need to handle a third of the
> total index instead of half.  Each one will have about 100GB of index
> data instead of 150GB.
>
> I am also hoping that upgrading from 4.9.1 to the 5.3.2 snapshot will
> increase performance.
>
> Something I will try right now is bumping the heap to 9GB to see if
> maybe there's heap starvation.  Based on the GC logs, I do not think
> this is the problem.
>
> Any other thoughts?
>
> Thanks,
> Shawn
>

Re: Strange debug output for a slow query

Reply via email to