> I'm doing some analysis on the wikipedia image metadata and seeing some > missing image rows in the sql dumps. > > I downloaded > enwiki-latest-image.sql, enwiki-latest-imagelinks.sql, > enwiki-latest-imagelinks.sql > and enwiki-latest-oldimage.sql from > http://dumps.wikimedia.org/enwiki/latest/ > > I picked a page, 25041, > http://en.wikipedia.org/wiki/Special:Export/Lockheed_P-38_Lightning > > I get 39 links from > "select il_to from imagelinks where il_from = 25041" > > When I query the image table for these, only 8 of the 39 appear. > Some of the missing files are 050218-F-1234P-076.jpg, 020930-O-9999G-017.jpg > > I grepped the original mysql file for these and get nothing. > > I can see the original file here though: > http://en.wikipedia.org/wiki/File:050218-F-1234P-076.jpg > > I did a select count and got a total of 849,801 rows. Seems low for the > total # of wikipedia images. > > Any ideas why i'm getting missing data? > > -- > @tommychheng > http://tommy.chheng.com > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20110822/9efd10a5/attachment.htm
The enwikipedia image tables only contain local images. Foreign images aren't included. In the case of Wikipedia the only foreign repository you have to worry about is Wikimedia Commons. If you download the relevant dumps from commons you should have your missing info. For more info on foreign repos see http://www.mediawiki.org/wiki/Manual:$wgForeignFileRepos -- -bawolff _______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
