--- El mar, 16/3/10, Kevin Webb <[email protected]> escribió:
> De: Kevin Webb <[email protected]> > Asunto: Re: [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming > pages-meta-history.xml.bz2 :D > Para: "Tomasz Finc" <[email protected]> > CC: "Wikimedia developers" <[email protected]>, > [email protected], [email protected] > Fecha: martes, 16 de marzo, 2010 21:10 > I just managed to finish > decompression. That took about 54 hours on an > EC2 2.5x unit CPU. The final data size is 5469GB. > > As the process just finished I haven't been able to check > the > integrity of the XML, however, the bzip stream itself > appears to be > good. > > As was mentioned previously, it would be great if you could > compress > future archives using pbzib to allow for parallel > decompression. As I > understand it, the pbzip files are reverse compatible with > all > existing bzip2 utilities. > Yes, they're :-). Regards, F. > Thanks again for all your work on this! > Kevin > > > On Tue, Mar 16, 2010 at 4:05 PM, Tomasz Finc <[email protected]> > wrote: > > Tomasz Finc wrote: > >> New full history en wiki snapshot is hot off the > presses! > >> > >> It's currently being checksummed which will take a > while for 280GB+ of > >> compressed data but for those brave souls willing > to test please grab it > >> from > >> > >> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2 > >> > >> and give us feedback about its quality. This run > took just over a month > >> and gained a huge speed up after Tims work on > re-compressing ES. If we > >> see no hiccups with this data snapshot, I'll start > mirroring it to other > >> locations (internet archive, amazon public data > sets, etc). > >> > >> For those not familiar, the last successful run > that we've seen of this > >> data goes all the way back to 2008-10-03. That's > over 1.5 years of > >> people waiting to get access to these data bits. > >> > >> I'm excited to say that we seem to have it :) > > > > So now that we've had it for a couple of days .. can I > get a status > > report from someone about its quality? > > > > Even if you had no issues please let us know so that > we start mirroring. > > > > --tomasz > > > > _______________________________________________ > > Xmldatadumps-admin-l mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l > > > > _______________________________________________ > Xmldatadumps-admin-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
