I'm doing some analysis on the wikipedia image metadata and seeing some missing image rows in the sql dumps.
I downloaded enwiki-latest-image.sql, enwiki-latest-imagelinks.sql, enwiki-latest-imagelinks.sql and enwiki-latest-oldimage.sql from http://dumps.wikimedia.org/enwiki/latest/ I picked a page, 25041, http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning I get 39 links from "select il_to from imagelinks where il_from = 25041" When I query the image table for these, only 8 of the 39 appear. Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg I grepped the original mysql file for these and get nothing. I can see the original file here though: http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg I did a select count and got a total of 849,801 rows. Seems low for the total # of wikipedia images. Any ideas why i'm getting missing data? -- @tommychheng http://tommy.chheng.com
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
