In a message off-list, Platonides wrote:

>  I think pretty much evryone using them would want the last dump, so I
>  don't see a problem in keeping world readable just the last two dumps or
>  so (I chose the number two in the case someone started using one dump
>  and wanted to finish with that, and a new one was published in the
>  meantime).

This is incorrect. When I got the idea to analyze external link statistics, I
needed all old dumps I could get, and they took a lot of time to download.
There was neither disk space nor bandwidth on the toolserver, so I had
to use a server of my own. I now have all page.sql.gz and externallinks.sql.gz,
but only I can use them, because they are on my server and not on the 
toolserver.
These files now take 160 GB, which is a fraction of a 2 TB disk that cost
100 euro to purchase. We're talking disk space at the cost of a lunch.

Limiting the toolserver to what most people would use, we could just
restrict it to dumps of the English and German Wikipedia, since that
is what the majority of users would be interested in. That sort of
thinking will lead you wrong every time.

How hard can it be to get enough disk space on the toolserver? I think
many chapters contribute money to its operation. Is it not enough?



-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to