Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Diederik van Liere
To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive containing multiple smaller XML files. It's quite tricky to split a very large XML file in smaller valid XML files and if the dumping process is already

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Platonides
Diederik van Liere wrote: To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive containing multiple smaller XML files. It's quite tricky to split a very large XML file in smaller valid XML files and if

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Diederik van Liere
Which dump file is offered in smaller sub files? On Sun, Dec 19, 2010 at 6:02 PM, Platonides platoni...@gmail.com wrote: Diederik van Liere wrote: To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Platonides
Diederik van Liere wrote: Which dump file is offered in smaller sub files? http://download.wikimedia.org/enwiki/20100904/ Also see http://wikitech.wikimedia.org/view/Dumps/Parallelization ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Diederik van Liere
Okay, no clue how I could have missed that. My google skills failed me :) thanks for the pointer! best Diederik On 2010-12-19, at 6:21 PM, Platonides wrote: Diederik van Liere wrote: Which dump file is offered in smaller sub files? http://download.wikimedia.org/enwiki/20100904/ Also see

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-19 Thread Ariel T. Glenn
Στις 20-12-2010, ημέρα Δευ, και ώρα 00:21 +0100, ο/η Platonides έγραψε: Diederik van Liere wrote: Which dump file is offered in smaller sub files? http://download.wikimedia.org/enwiki/20100904/ Also see http://wikitech.wikimedia.org/view/Dumps/Parallelization Expect to see more of this

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-17 Thread Roan Kattouw
2010/12/17 Platonides platoni...@gmail.com: -even assuming that the memcached can happily handle it and no other data is affecting by it- the network delay make it a non-free operation. Because memcached uses LRU, I think this'll also flood a lot of stuff out of the cache. Roan Kattouw

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-16 Thread Platonides
Roan Kattouw wrote: I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how much it would help (my impression is ES is one of the slower parts of our

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-16 Thread Ariel T. Glenn
Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε: Roan Kattouw wrote: I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how

[Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-15 Thread Diederik van Liere
Dear devs, I would like to initiate a discussion about how to reduce the time required to generate dump files. A while ago Emmanuel Engelhart opened a bugreport suggesting to parallelize this feature and I would like to go through the available options and hopefully determine a course of

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-15 Thread Ariel T. Glenn
Indeed I run parallel dumps based on a range of ids... although the algorithm was needing tweaking. I expect to get back to looking at that pretty soon. Ariel Στις 15-12-2010, ημέρα Τετ, και ώρα 13:01 -0800, ο/η Diederik van Liere έγραψε: Dear devs, I would like to initiate a discussion

Re: [Wikitech-l] Parallelizing export dump (bug 24630)

2010-12-15 Thread Roan Kattouw
2010/12/15 Diederik van Liere dvanli...@gmail.com: However, if the export functionality is primarily used by Wikimedia and nobody else then we might consider a different language. Or, we make a standalone app that is not part of Mediawiki and it's use is only internally for Wikimedia. If