Jamie Morken wrote:
> 
> Hi,
> 
>> What do you mean by "opening"?
>> enwiki pages-meta-history is hard due to its size, not because 
>> Ariel or
>> Tomasz being more stupid than any volunteer.
>> I trust them to do it at least as well as a volunteer would.
>> Of course, if you can perform better I'm all for giving you a 
>> shell to
>> fix it, and the scripts are there for improvements as well.
> 
> I wasn't aware that the dump scripts were publicly available, where can they 
> be downloaded from or are they part of mediawiki?

It is in http://svn.wikimedia.org/viewvc/mediawiki/trunk/backup/
although the files look a bit old, so perhaps there are some uncommitted
changes?
/me looks for offenders


>> What do you need exactly about the images? Which image dumps do you
>> want? Do you have enough terabytes to store them?
>> Dumps/Access has been given by request in the past to that data.
>> If it's not there it's because:
>> a) Those dumps would take a lot of space.
> 
> I don't think that is a valid reason, thumbnail dumps of all the 
>images from enwiki would probably be a smaller file than the current
>enwiki pages-meta-history bz2 file.

We have thumbs on lots of sizes. Which size do you want the thumbs? It's
easy to tar all the images used on a wiki, since that's tracked in the
database, but not at all knowing which exact size was each of them used.

enwiki has a total of 858979 local files which sum 229 GB (and there's
still commmons). 2357967 unique images (37050694 uses) are in their
articles. Assuming 20Kb per image thumb (is that a good value?), that's
48 Gb, more than the 31.9 GB of the (really compressed)
pages-meta-history.xml.7z but we would need to agree. They would tie at
14 Kb.

Even if all thumbs were unrealistically small, 1Kb each, they would
still be several GB.


> b) Nobody feels particulary interested in them.
> I disagree, there has been a lot of interest in having image dumps 
>available for download.  There was a discussion on this recently on the 
>xmldatadumps list, that basically concluded that subsets of images 
>(ie. enwiki thumbnails) would be useful.  

I am unable to find it, although a thread like that somewhere rings a
bell to me.


> There are wiki pages dedicated 
>to this topic of how to download images, this is because there are no 
>image dumps available.  Is the wikimedia foundation interested to host
>image dumps again?  If they are maybe we can start a discussion on how 
>to make the script and what image dumps to start with.
> 
> cheers,
> Jamie



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to