Indeed I run parallel dumps based on a range of ids... although the
algorithm was needing tweaking.  I expect to get back to looking at that
pretty soon. 

Ariel

Στις 15-12-2010, ημέρα Τετ, και ώρα 13:01 -0800, ο/η Diederik van Liere
έγραψε:
> Dear devs,
> 
> I would like to initiate a discussion about how to reduce the time required 
> to generate dump files. A while ago Emmanuel Engelhart opened a bugreport 
> suggesting to parallelize this feature and I would like to go through the 
> available options and hopefully determine a course of action.
> 
> The current process is straightforward and sequential (as far as I know): it 
> reads table by table and row by row and stores the output. The drawbacks of 
> this process are that it takes increasingly more time to generate a dump as 
> the different projects continue to grow and when the process halts or is 
> interrupted then it needs to start all over again. 
> 
> I believe that there are two approaches to parallelizing the export dump:
> 1) Launch multiple PHP processes that each take care of a particular range of 
> ids. This might not be called true parallelization, but it achieves the same 
> goal. The reason for this approach is that PHP has very limited (maybe no) 
> support for parallelization / multiprocessing. The only thing PHP can do is 
> fork a process (I might be incorrect about this)
> 
> 
> 2) Use a different language with builtin support for multiprocessing like 
> Java or Python. I am not intending to start an heated debate but I think this 
> is an option that at least should be on the table and be discussed. 
> Obviously, an important reason not to do it is that it's a different 
> language. I am not sure how integral the export functionality is to MediaWiki 
> and if it is then this is a dead end. 
> 
> However, if the export functionality is primarily used by Wikimedia and 
> nobody else then we might consider a different language. Or, we make a 
> standalone app that is not part of Mediawiki and it's use is only internally 
> for Wikimedia.
> 
> 
> If i am missing other approaches or solutions then please chime in. 
> 
> Best regards,
> 
> 
> Diederik
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to