To continue the discussion on how to improve the performance, would it be
possible to distribute the dumps as a 7z / gz / other format archive containing
multiple smaller XML files. It's quite tricky to split a very large XML file in
smaller valid XML files and if the dumping process is already
Diederik van Liere wrote:
To continue the discussion on how to improve the performance, would it be
possible to distribute the dumps as a 7z / gz / other format archive
containing multiple smaller XML files. It's quite tricky to split a very
large XML file in smaller valid XML files and if
Which dump file is offered in smaller sub files?
On Sun, Dec 19, 2010 at 6:02 PM, Platonides platoni...@gmail.com wrote:
Diederik van Liere wrote:
To continue the discussion on how to improve the performance, would it be
possible to distribute the dumps as a 7z / gz / other format archive
Diederik van Liere wrote:
Which dump file is offered in smaller sub files?
http://download.wikimedia.org/enwiki/20100904/
Also see http://wikitech.wikimedia.org/view/Dumps/Parallelization
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
Okay, no clue how I could have missed that. My google skills failed me :)
thanks for the pointer!
best
Diederik
On 2010-12-19, at 6:21 PM, Platonides wrote:
Diederik van Liere wrote:
Which dump file is offered in smaller sub files?
http://download.wikimedia.org/enwiki/20100904/
Also see
Στις 20-12-2010, ημέρα Δευ, και ώρα 00:21 +0100, ο/η Platonides έγραψε:
Diederik van Liere wrote:
Which dump file is offered in smaller sub files?
http://download.wikimedia.org/enwiki/20100904/
Also see http://wikitech.wikimedia.org/view/Dumps/Parallelization
Expect to see more of this
2010/12/17 Platonides platoni...@gmail.com:
-even assuming that the memcached can
happily handle it and no other data is affecting by it- the network
delay make it a non-free operation.
Because memcached uses LRU, I think this'll also flood a lot of stuff
out of the cache.
Roan Kattouw
Roan Kattouw wrote:
I'm not sure how hard this would be to achieve (you'd have to
correlate blob parts with revisions manually using the text table;
there might be gaps for deleted revs because ES is append-only) or how
much it would help (my impression is ES is one of the slower parts of
our
Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε:
Roan Kattouw wrote:
I'm not sure how hard this would be to achieve (you'd have to
correlate blob parts with revisions manually using the text table;
there might be gaps for deleted revs because ES is append-only) or how
Dear devs,
I would like to initiate a discussion about how to reduce the time required to
generate dump files. A while ago Emmanuel Engelhart opened a bugreport
suggesting to parallelize this feature and I would like to go through the
available options and hopefully determine a course of
Indeed I run parallel dumps based on a range of ids... although the
algorithm was needing tweaking. I expect to get back to looking at that
pretty soon.
Ariel
Στις 15-12-2010, ημέρα Τετ, και ώρα 13:01 -0800, ο/η Diederik van Liere
έγραψε:
Dear devs,
I would like to initiate a discussion
2010/12/15 Diederik van Liere dvanli...@gmail.com:
However, if the export functionality is primarily used by Wikimedia and
nobody else then we might consider a different language. Or, we make a
standalone app that is not part of Mediawiki and it's use is only internally
for Wikimedia.
If
12 matches
Mail list logo