Robert Rohde wrote:
> Many of the things done for the statistical analysis of database dumps
> should be suitable for parallelization (e.g. break the dump into
> chunks, process the chunks in parallel and sum the results).  You
> could talk to Erik Zachte.  I don't know if his code has already been
> designed for parallel processing though.

I don't think it's a good candidate since you are presumably using
compressed files, and its decompression linearises it (and is most
likely the bottleneck, too).


> Another option might be to look at the methods for compressing old
> revisions (is [1] still current?).
> 
> I make heavy use of parallel processing in my professional work (not
> related to wikis), but I can't really think of any projects I have at
> hand that would be accessible and completable in a month.
> 
> -Robert Rohde
> 
> [1] http://www.mediawiki.org/wiki/Manual:CompressOld.php

It can be used, I am unsure if it is used by WMF.

Another thing that would be nice to have parallelised would be things
like parser tests. That would need adding cotasks to php or so. The most
similar extension I know is runkit which is the other way around:
several php scopes instead of several threads in one scope.


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to