On Wed, Jan 22, 2014 at 10:31 AM, Matthew Flaschen <mflasc...@wikimedia.org>wrote:
> On 01/21/2014 09:47 PM, Amir Ladsgroup wrote: > >> One of the things I can't understand is why we are extracting summary of >> pages for Yahoo? Is it our job to do it? the dumps are really huge >> e.g. forwikidata:<http://dumps.wikimedia.org/wikidatawiki/20140106/> >> wikidatawiki-20140106-abstract.xml<http://dumps. >> wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-abstract.xml >> >14.1 >> >> GB >> Compare it to: full history: >> wikidatawiki-20140106-pages-meta-history.xml.bz2<http:// >> dumps.wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-pages- >> meta-history.xml.bz2>8.8 >> GB >> > > That's because the Yahoo one isn't compressed. > > why? can we make it compressed? It's really annoying to see that huge file there for (even almost) no reason. > I'm not sure if Yahoo still uses those abstracts, but I wouldn't be > surprised at all if other people are. > > Matt Flaschen > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Amir _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l