https://bugzilla.wikimedia.org/show_bug.cgi?id=47141

Tim Starling <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #2 from Tim Starling <[email protected]> ---
We're not having many Icinga alerts at the moment, but there are probably
regular brief overload events that do not result in an alert.

Basically, the problem is that when a new index is swapped in after an rsync,
most of it is not in the kernel cache, so many requests are slow. This results
in client-side timeouts, and thus a flood of "zero results" pages delivered to
users.

The deployment of PoolCounter for search probably helped with this, since it
would avoid using up all search workers, so there would still be a couple left
to service Icinga requests. Also, the use of PoolCounter avoids exhausting
Apache MaxClients. This removes the need to set an aggressively low client side
timeout, so part of the solution here may be just to increase the timeout,
allowing the cluster to ride out short periods of slowness by queueing.

There is also a "warmup" feature in lucene-search-2 which probably needs to be
configured correctly.

Chad is about to commit a patch for logging of client-side timeouts, so that
will give us more visibility into the scale of the problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to