https://bugzilla.wikimedia.org/show_bug.cgi?id=47141
Tim Starling <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #2 from Tim Starling <[email protected]> --- We're not having many Icinga alerts at the moment, but there are probably regular brief overload events that do not result in an alert. Basically, the problem is that when a new index is swapped in after an rsync, most of it is not in the kernel cache, so many requests are slow. This results in client-side timeouts, and thus a flood of "zero results" pages delivered to users. The deployment of PoolCounter for search probably helped with this, since it would avoid using up all search workers, so there would still be a couple left to service Icinga requests. Also, the use of PoolCounter avoids exhausting Apache MaxClients. This removes the need to set an aggressively low client side timeout, so part of the solution here may be just to increase the timeout, allowing the cluster to ride out short periods of slowness by queueing. There is also a "warmup" feature in lucene-search-2 which probably needs to be configured correctly. Chad is about to commit a patch for logging of client-side timeouts, so that will give us more visibility into the scale of the problem. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
