On 11-03-13 06:05 PM, Tim Starling wrote:
> On 14/03/11 11:48, William Allen Simpson wrote:
>> Secure basically fell over for awhile, generated nothing but proxy
>> errors.  I'm
>> not sure that's what really happened.  It may have been a complete
>> inability to
>> actually send or receive data, resulting in a timeout of some sort.
>>
>> Take a look at the Ganglia graphs.  Free memory gone.  Big spike in
>> processes.
>> Big drop in network activity!
> It was because of the CPU overload on the entire apache cluster which
> occurred at that time. Secure and every other frontend proxy would
> have reported errors. Domas and I traced it back to job queue cache
> invalidations from an edit to [[Template:Reflist]] on the English
> Wikipedia.
>
> Note that the free memory isn't gone. RRDtool has the very
> unscientific practice of starting the vertical scale at something
> other than zero. It rose because processes use memory, and as you
> noted, the number of processes increased. This is because they were
> queueing, waiting for the overloaded backend cluster to serve them.
>
> -- Tim Starling
Interesting.
Which part specifically do you think actually caused the extreme load?
Having to re-parse a large number of pages as people view them?
Did the issue show up from invalidation pre-queue, or did the issue crop 
up after the jobs were run?
Was this just isolated to the secure servers, ie: didn't really effect 
the whole cluster but was simply an issue because secure doesn't have as 
large a deployment as non-secure?

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to