Robert Rohde wrote: > Many of the things done for the statistical analysis of database dumps > should be suitable for parallelization (e.g. break the dump into > chunks, process the chunks in parallel and sum the results). You > could talk to Erik Zachte. I don't know if his code has already been > designed for parallel processing though.
I don't think it's a good candidate since you are presumably using compressed files, and its decompression linearises it (and is most likely the bottleneck, too). > Another option might be to look at the methods for compressing old > revisions (is [1] still current?). > > I make heavy use of parallel processing in my professional work (not > related to wikis), but I can't really think of any projects I have at > hand that would be accessible and completable in a month. > > -Robert Rohde > > [1] http://www.mediawiki.org/wiki/Manual:CompressOld.php It can be used, I am unsure if it is used by WMF. Another thing that would be nice to have parallelised would be things like parser tests. That would need adding cotasks to php or so. The most similar extension I know is runkit which is the other way around: several php scopes instead of several threads in one scope. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l