On 01/21/2014 09:47 PM, Amir Ladsgroup wrote:
One of the things I can't understand is why we are extracting summary of
pages for Yahoo? Is it our job to do it? the dumps are really huge
e.g. forwikidata:<http://dumps.wikimedia.org/wikidatawiki/20140106/>
wikidatawiki-20140106-abstract.xml<http://dumps.wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-abstract.xml>14.1
GB
Compare it to: full history:
wikidatawiki-20140106-pages-meta-history.xml.bz2<http://dumps.wikimedia.org/wikidatawiki/20140106/wikidatawiki-20140106-pages-meta-history.xml.bz2>8.8
GB

That's because the Yahoo one isn't compressed.

I'm not sure if Yahoo still uses those abstracts, but I wouldn't be surprised at all if other people are.

Matt Flaschen


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to