> I'm doing some analysis on the wikipedia image metadata and seeing some
> missing image rows in the sql dumps.
>
> I downloaded
> enwiki-latest-image.sql, enwiki-latest-imagelinks.sql,
> enwiki-latest-imagelinks.sql
> and enwiki-latest-oldimage.sql from
> http://dumps.wikimedia.org/enwiki/latest/
>
> I picked a page, 25041,
> http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning
>
> I get 39 links from
> "select il_to from imagelinks where il_from = 25041"
>
> When I query the image table for these, only 8 of the 39 appear.
> Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg
>
> I grepped the original mysql file for these and get nothing.
>
> I can see the original file here though:
> http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg
>
> I did a select count and got a total of 849,801 rows. Seems low for the
> total # of wikipedia images.
>
> Any ideas why i'm getting missing data?
>
> --
> @tommychheng
> http://tommy.chheng.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> http://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20110822/9efd10a5/attachment.htm

The enwikipedia image tables only contain local images. Foreign images
aren't included. In the case of Wikipedia the only foreign repository
you have to worry about is Wikimedia Commons. If you download the
relevant dumps from commons you should have your missing info. For
more info on foreign repos see
http://www.mediawiki.org/wiki/Manual:$wgForeignFileRepos

--
-bawolff

_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to