On Thu, Sep 9, 2010 at 10:54 PM, Jamie Morken <[email protected]> wrote:
> Hi all,
>
> This is a preliminary list of what needs to be done to generate images 
> dumps.  If anyone can help with #2 to provide the access log of image usage 
> stats please send me an email!
>
> 1. run wikix to generate list of images for a given wiki ie. enwiki
>
> 2. sort the image list based on usage frequency from access log files

Hi,

It will be great to have these image dumps ! I wonder if a different
dump my be worth it for a different scenario:

* User only wants to get the photos for a small set of ids i.e. 1000 pages

What would be the proper way to get these photos without downloading
large dumps ?

    a. Parse the actual html pages and get the actual image urls (plus
license info and then download the images) ?

    b. Try to find the actual image urls using the commons wikitext
dump (and parse license info, ..) ?

Both approaches seem complicated so maybe a different dump would be helpful:

Page id  -->  List of [ Image id | real url |   type (original |
dim_xy | thumb) | license ]

regards

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to