I think having access to them on Commons repository is much easier to handle. A subset should be good enough.
Having 11 TB of images needs huge research capabilities in order to handle all of them and work with all of them. Maybe a special API or advanced API functions would allow people enough access and at the same time save the bandwidth and the hassle to handle this behemoth collection. bilal -- Verily, with hardship comes ease. On Fri, Jan 8, 2010 at 1:57 PM, Tomasz Finc <[email protected]> wrote: > William Pietri wrote: > > On 01/07/2010 01:40 AM, Jamie Morken wrote: > >> I have a > >> suggestion for wikipedia!! I think that the database dumps including > >> the image files should be made available by a wikipedia bittorrent > >> tracker so that people would be able to download the wikipedia backups > >> including the images (which currently they can't do) and also so that > >> wikipedia's bandwidth costs would be reduced. [...] > >> > > > > Is the bandwidth used really a big problem? Bandwidth is pretty cheap > > these days, and given Wikipedia's total draw, I suspect the occasional > > dump download isn't much of a problem. > > No, bandwidth is not really the problem here. I think the core issue is > to have bulk access to images. > > There have been a number of these requests in the past and after talking > back and forth, it has usually been the case that a smaller subset of > the data works just as well. > > A good example of this was the Deutsche Fotokek archive made late last > year. > > http://download.wikipedia.org/images/Deutsche_Fotothek.tar ( 11GB ) > > This provided an easily retrievable high quality subset of our image > data which researchers could use. > > Now if we were to snapshot image data and store them for a particular > project the amount of duplicate image data would become significant. > That's because we re-use a ton of image data between projects and > rightfully so. > > If instead we package all of commons into a tarball then we get roughly > 6T's of image data which after numerous conversation has been a bit more > then most people want to process. > > So what does everyone think of going down the collections route? > > If we provide enough different and up to date ones then we could easily > give people a large but manageable amount of data to work with. > > If there is a page already for this then please feel free to point me to > it otherwise I'll create one. > > --tomasz > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
