Diederik van Liere wrote:
> To continue the discussion on how to improve the performance, would it be 
> possible to distribute the dumps as a 7z / gz / other format archive 
> containing multiple smaller XML files. It's quite tricky to split a very 
> large XML file in smaller valid XML files and if the dumping process is 
> already parallelized then we do not have to cat the different XML files to 
> one large XML file but instead we can distribute multiple smaller 
> parallelized files .
> 
> best,
> 
> Diederik

That has already been done for enwiki.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to