(I just posted the following to the tech blog, http://techblog.wikimedia.org)


Last Monday, our Solaris server that contains all image thumbnails developed 
problems. It ran out of memory, became too slow and eventually even started to 
crash. (For the technically inclined: we think the kernel is leaking some file 
system structure in kernel memory.) This caused missing thumbnails across 
Wikimedia projects.

We addressed these problems in the following ways:
* We decreased the load on this server by adapting the Squid configuration, so 
it would have to handle fewer requests. 
* We ordered more memory, in order to double the total physical memory in the 
relevant systems.
* We set up two new Linux servers that will eventually replace the Solaris 
server. 

At first, the addition of these Linux servers in a partially caching setup 
seemed enough to fix the immediate problem, while gradually copying all 
thumbnail files, allowing us to replace the Solaris server completely.

However, on Saturday night the Solaris server started crashing repeatedly, 
making it necessary to engage the image scalers to regenerate a large part of 
the missing thumbnails. This is causing some slowness of loading and generating 
new (uncached) thumbnails.

Fortunately, most users have not experienced serious problems while using the 
site, since most thumbnails are cached by our HTTP caching layer. It is 
impossible to determine exactly how long it will take to recover completely 
from the slower service, but we expect that this will take no more than a few 
days.

Over the past months we have been developing a new and more scalable 
architecture for media storage, which will solve these problems once and for 
all. We hope to deploy this new architecture within a few months, also 
utilizing the new data center. Please watch the Tech Blog for updates on this 
project.


-- 
Mark Bergsma <[email protected]>
Operations Engineering Program Manager
Wikimedia Foundation




_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to