That's a good point. I actually need a W* media dump now in my work (incl. usage + all captions), if you do too perhaps we should see how effectively we can compile one via a WMC tool. Alternately if WME can offer same for a fee I would be glad to pay that.
S. 🌍🌏🌎🌑 On Thu, Apr 10, 2025, 2:57 PM Federico Leva (Nemo) <nemow...@gmail.com> wrote: > Il 08/04/25 18:08, Giuseppe Lavagetto ha scritto: > > I’ve updated our Robot Policy[0], which was vastly outdated, the main > > revision being from 2009. > > Thanks for working on an update! It seems there was a misalignment of > expectations, which is in itself a problem to fix. > > > > > The new policy isn’t more restrictive than the older one for general > > crawling of the site or the API; on the contrary we allow higher limits > > than previously stated. > > I find this hard to believe, considering this new sentence for > upload.wikimedia.org: «Always keep a total concurrency of at most 2, and > limit your total download speed to 25 Mbps (as measured over 10 second > intervals).» > > This is a ridiculously low limit. It's a speed which is easy to breach > in casual browsing of Wikimedia Commons categories, let alone with any > kind of media-related bots. > > At the suggested speed, it would take over 150 years for a person to > download Wikimedia Commons files alone. > > Needless to say, I breached such a threshold all the time when I > compiled the https://archive.org/details/wikimediacommons collection. I > typically aimed to saturate my upload bandwidth at all times when > updating it, so I must have tried to download at about 100 Mbps, and it > still took me months. (I used to run those scripts from my home in > Milan, downloading the files to an external HDD. I stopped updating the > collection after 2016 in part because I don't have FTTH in Helsinki, and > the daily downloads were far too big for any storage in Wikimedia Cloud.) > > I appreciate that some exceptions for Wikimedia Cloud bots were added > after the discussion at > https://phabricator.wikimedia.org/T391020#10716478 , but the fact > remains that this comes off as a big change. > > Il 09/04/25 19:10, AntiCompositeNumber ha scritto: > > I'll just note that both API:Etiquette and the Robot Policy have been > incorporated by reference into the Terms of Use: > https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use/en#12._API_Terms > > > > Undiscussed changes to the Terms of Use should be avoided. > > This is a good point. > > There are parts of the terms of use which assume the [[m:Right to fork]] > is upheld by the availability of mirrored dumps. But the media tarballs > have not been updated since 2012. Now in effect the WMF is explicitly > saying that no mirrors are allowed for media, unless by gracious > exemption to individual requesters. > > Best, > Federico > _______________________________________________ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/