On Thu, Sep 9, 2010 at 10:54 PM, Jamie Morken <[email protected]> wrote:
> Hi all,
>
> This is a preliminary list of what needs to be done to generate images
> dumps. If anyone can help with #2 to provide the access log of image usage
> stats please send me an email!
>
> 1. run wikix to generate list of images for a given wiki ie. enwiki
>
> 2. sort the image list based on usage frequency from access log files
Hi,
It will be great to have these image dumps ! I wonder if a different
dump my be worth it for a different scenario:
* User only wants to get the photos for a small set of ids i.e. 1000 pages
What would be the proper way to get these photos without downloading
large dumps ?
a. Parse the actual html pages and get the actual image urls (plus
license info and then download the images) ?
b. Try to find the actual image urls using the commons wikitext
dump (and parse license info, ..) ?
Both approaches seem complicated so maybe a different dump would be helpful:
Page id --> List of [ Image id | real url | type (original |
dim_xy | thumb) | license ]
regards
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l