Indeed I run parallel dumps based on a range of ids... although the algorithm was needing tweaking. I expect to get back to looking at that pretty soon.
Ariel Στις 15-12-2010, ημέρα Τετ, και ώρα 13:01 -0800, ο/η Diederik van Liere έγραψε: > Dear devs, > > I would like to initiate a discussion about how to reduce the time required > to generate dump files. A while ago Emmanuel Engelhart opened a bugreport > suggesting to parallelize this feature and I would like to go through the > available options and hopefully determine a course of action. > > The current process is straightforward and sequential (as far as I know): it > reads table by table and row by row and stores the output. The drawbacks of > this process are that it takes increasingly more time to generate a dump as > the different projects continue to grow and when the process halts or is > interrupted then it needs to start all over again. > > I believe that there are two approaches to parallelizing the export dump: > 1) Launch multiple PHP processes that each take care of a particular range of > ids. This might not be called true parallelization, but it achieves the same > goal. The reason for this approach is that PHP has very limited (maybe no) > support for parallelization / multiprocessing. The only thing PHP can do is > fork a process (I might be incorrect about this) > > > 2) Use a different language with builtin support for multiprocessing like > Java or Python. I am not intending to start an heated debate but I think this > is an option that at least should be on the table and be discussed. > Obviously, an important reason not to do it is that it's a different > language. I am not sure how integral the export functionality is to MediaWiki > and if it is then this is a dead end. > > However, if the export functionality is primarily used by Wikimedia and > nobody else then we might consider a different language. Or, we make a > standalone app that is not part of Mediawiki and it's use is only internally > for Wikimedia. > > > If i am missing other approaches or solutions then please chime in. > > Best regards, > > > Diederik > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
