Many of the things done for the statistical analysis of database dumps
should be suitable for parallelization (e.g. break the dump into
chunks, process the chunks in parallel and sum the results).  You
could talk to Erik Zachte.  I don't know if his code has already been
designed for parallel processing though.

Another option might be to look at the methods for compressing old
revisions (is [1] still current?).

I make heavy use of parallel processing in my professional work (not
related to wikis), but I can't really think of any projects I have at
hand that would be accessible and completable in a month.

-Robert Rohde

[1] http://www.mediawiki.org/wiki/Manual:CompressOld.php

On Sun, Oct 24, 2010 at 5:42 PM, Aryeh Gregor
<simetrical+wikil...@gmail.com> wrote:
> This term I'm taking a course in high-performance computing
> <http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html>, and I have
> to pick a topic for a final project.  According to the assignment
> <http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf>,
> "The only real requirement is that it be something in parallel."  In
> the class, we covered
>
> * Microoptimization of single-threaded code (efficient use of CPU cache, etc.)
> * Multithreaded programming using OpenMP
> * GPU programming using OpenCL
>
> and will probably briefly cover distributed computing over multiple
> machines with MPI.  I will have access to a high-performance cluster
> at NYU, including lots of CPU nodes and some high-end GPUs.  Unlike
> most of the other people in the class, I don't have any interesting
> science projects I'm working on, so something useful to
> MediaWiki/Wikimedia/Wikipedia is my first thought.  If anyone has any
> suggestions, please share.  (If you have non-Wikimedia-related ones,
> I'd also be interested in hearing about them offlist.)  They shouldn't
> be too ambitious, since I have to finish them in about a month, while
> doing work for three other courses and a bunch of other stuff.
>
> My first thought was to write a GPU program to crack MediaWiki
> password hashes as quickly as possible, then use what we've studied in
> class about GPU architecture to design a hash function that would be
> as slow as possible to crack on a GPU relative to its PHP execution
> speed, as Tim suggested a while back.  However, maybe there's
> something more interesting I could do.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to