Quick update on dump status:

* Dumps are back up and running on srv31, the old dump batch host.

Please note that unlike the wikis sites themselves, dump activity is 
*not* considered time-critical -- there is no emergency requirement to 
get them running as soon as possible.

Getting dumps running again after a few days is nearly as good as 
getting them running again immediately. Yes, it sucks when it takes 
longer than we'd like. No, it's not the end of the world.


* Dump runner redesign is in progress.

I've chatted a bit with Tim in the past on rearranging the architecture 
of the dump system to allow for horizontal scaling, which will make the 
big history dumps much much faster by distributing the work across 
multiple CPUs or hosts where it's currently limited to a single thread 
per wiki.

We seem to be in agreement on the basic arch, and Tomasz is now in 
charge of making this happen; he'll be poking at infrastructure for this 
over the next few days -- using his past experience with distributed 
index build systems at Amazon to guide his research -- and will report 
to y'all later this week with some more concrete details.


* Dump format changes are in progress.

Robert Rohde's p.o.c code for diff-based dumps is in our SVN and 
available for testing.

We'll be looking at what the possibility on integrating this is to see 
what the effect on dump performance is; currently performance and 
reliability are our primary concerns, rather than output file size, but 
they can intersect since the bzip2 data compression is a time factor.

This will be pushed back to later if we don't see an immediate 
generation-speed improvement, but it's very much a desired project since 
it will make the full-history dump files much smaller.

-- brion

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to