Il 08/04/25 18:08, Giuseppe Lavagetto ha scritto:
I’ve updated our Robot Policy[0], which was vastly outdated, the main
revision being from 2009.
Thanks for working on an update! It seems there was a misalignment of
expectations, which is in itself a problem to fix.
The new policy isn’t more restrictive than the older one for general
crawling of the site or the API; on the contrary we allow higher limits
than previously stated.
I find this hard to believe, considering this new sentence for
upload.wikimedia.org: «Always keep a total concurrency of at most 2, and
limit your total download speed to 25 Mbps (as measured over 10 second
intervals).»
This is a ridiculously low limit. It's a speed which is easy to breach
in casual browsing of Wikimedia Commons categories, let alone with any
kind of media-related bots.
At the suggested speed, it would take over 150 years for a person to
download Wikimedia Commons files alone.
Needless to say, I breached such a threshold all the time when I
compiled the https://archive.org/details/wikimediacommons collection. I
typically aimed to saturate my upload bandwidth at all times when
updating it, so I must have tried to download at about 100 Mbps, and it
still took me months. (I used to run those scripts from my home in
Milan, downloading the files to an external HDD. I stopped updating the
collection after 2016 in part because I don't have FTTH in Helsinki, and
the daily downloads were far too big for any storage in Wikimedia Cloud.)
I appreciate that some exceptions for Wikimedia Cloud bots were added
after the discussion at
https://phabricator.wikimedia.org/T391020#10716478 , but the fact
remains that this comes off as a big change.
Il 09/04/25 19:10, AntiCompositeNumber ha scritto:
> I'll just note that both API:Etiquette and the Robot Policy have been
incorporated by reference into the Terms of Use:
https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use/en#12._API_Terms
>
> Undiscussed changes to the Terms of Use should be avoided.
This is a good point.
There are parts of the terms of use which assume the [[m:Right to fork]]
is upheld by the availability of mirrored dumps. But the media tarballs
have not been updated since 2012. Now in effect the WMF is explicitly
saying that no mirrors are allowed for media, unless by gracious
exemption to individual requesters.
Best,
Federico
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/