Andrew Dunbar wrote:
> Just a thought, wouldn't it be easier to generate dumps in parallel if
> we did away with the assumption that the dump would be in database
> order. The metadata in the dump provides the ordering info for the
> people that require it.
> 
> Andrew Dunbar (hippietrail)

I don't see how doing the dumps in a different order allows you to
greater parallelism.
You can already launch several processes at different points of the set.
Giving one every N articles to each process would allow more balanced
pieces, but that's not important. You would also skip the work of
reading the old dump to the offset, although that's reasonably fast.
The important point for having them in this order is the property to
keep the pages in the same order as the previous dump.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to