On 1/7/09 9:16 PM, Robert Rohde wrote:
> Yes, you are right about that.  For bulk transport and storage it is
> not a big improvement.
>
> However, to work with ruwiki, for example, one generally needs to
> decompress it to the full 170 GB.  To work with enwiki's full revision
> history, if such a dump is ever to exist again, would probably
> decompress to ~2 TB.  7z and bz2 are not great formats if one wants to
> extract only portions of the dump since there are few tools that would
> allow one to do so without first reinflating the whole file.  Hence,
> one of the advantages I see in my format is being able to have a dump
> that is still<10% the full inflated size while also being able to
> parse out selected articles or selected revisions in a straightforward
> manner.

*nod* this is an attractive option for that sort of case; you can pull 
the metadata and grep for interesting changes in a much more manageable 
file.

Note we'll also want to be considering different options for breaking up 
the dump into smaller files, which makes parallel generation more 
feasible as well as assisting some downloaders.

-- brion

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to