Dispenser kindly made a list of DjVu files on Commons linking an IA item, with some information like global usage: https://toolserver.org/~dispenser/temp/djvu2archive.org.txt (just change the extension to csv to open it as a spreadsheet, tab-separated). It's about 5000 books with 6-200 global usages and 5000 outside that range (which probably means completely unused apart some talk pages or whatever, or with most text already living on wiki pages). If I manage to convince a "slash-admin", I'll get those 5000 re-OCR'd, otherwise I need to do it manually so suggestions on priorities are welcome. :)

Nemo

P.s.:
The used query http://pastebin.com/L4EXDY5F and another one http://pastebin.com/avg3LYti

-- 2> /dev/null; date; echo '
SELECT /*SLOW_OK*/
  CONCAT("[[:File:", REPLACE(img_name, "_", " "), "]]") AS "File",
  el_from,
  img_size,
  REPLACE(img_user_text, "_", " ") AS "Uploader",
  img_timestamp AS "Timestamp",
  el_to,
(SELECT COUNT(*) FROM globalimagelinks WHERE gil_to=page_title) AS "Usage",
  (SELECT COUNT(*) FROM oldimage WHERE oi_name=page_title) AS "Reuploads",
  SUBSTRING_INDEX(el_to,"/",-1) AS "Archive_Name"
FROM image
JOIN page ON page_namespace=6 AND page_title=CONVERT(img_name USING latin1)
JOIN externallinks ON el_from=page_id
WHERE img_media_type="BITMAP"
AND img_major_mime="image" AND img_minor_mime="vnd.djvu"
AND (el_index LIKE "http://org.archive.%/details/_%";
OR  el_index LIKE "https://org.archive.%/details/_%";)
;-- ' | sql -r commonswiki_p > ~/public_html/temp/djvu2archive.org.txt; date;

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to