On 11-03-13 06:05 PM, Tim Starling wrote: > On 14/03/11 11:48, William Allen Simpson wrote: >> Secure basically fell over for awhile, generated nothing but proxy >> errors. I'm >> not sure that's what really happened. It may have been a complete >> inability to >> actually send or receive data, resulting in a timeout of some sort. >> >> Take a look at the Ganglia graphs. Free memory gone. Big spike in >> processes. >> Big drop in network activity! > It was because of the CPU overload on the entire apache cluster which > occurred at that time. Secure and every other frontend proxy would > have reported errors. Domas and I traced it back to job queue cache > invalidations from an edit to [[Template:Reflist]] on the English > Wikipedia. > > Note that the free memory isn't gone. RRDtool has the very > unscientific practice of starting the vertical scale at something > other than zero. It rose because processes use memory, and as you > noted, the number of processes increased. This is because they were > queueing, waiting for the overloaded backend cluster to serve them. > > -- Tim Starling Interesting. Which part specifically do you think actually caused the extreme load? Having to re-parse a large number of pages as people view them? Did the issue show up from invalidation pre-queue, or did the issue crop up after the jobs were run? Was this just isolated to the secure servers, ie: didn't really effect the whole cluster but was simply an issue because secure doesn't have as large a deployment as non-secure?
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
