That's a good point.  I actually need a W* media dump now in my work (incl.
usage + all captions), if you do too perhaps we should see how effectively
we can compile one via a WMC tool. Alternately if WME can offer same for a
fee I would be glad to pay that.

S.

🌍🌏🌎🌑

On Thu, Apr 10, 2025, 2:57 PM Federico Leva (Nemo) <nemow...@gmail.com>
wrote:

> Il 08/04/25 18:08, Giuseppe Lavagetto ha scritto:
> > I’ve  updated our Robot Policy[0], which was vastly outdated, the main
> > revision being from 2009.
>
> Thanks for working on an update! It seems there was a misalignment of
> expectations, which is in itself a problem to fix.
>
> >
> > The new policy isn’t more restrictive than the older one for general
> > crawling of the site or the API; on the contrary we allow higher limits
> > than previously stated.
>
> I find this hard to believe, considering this new sentence for
> upload.wikimedia.org: «Always keep a total concurrency of at most 2, and
> limit your total download speed to 25 Mbps (as measured over 10 second
> intervals).»
>
> This is a ridiculously low limit. It's a speed which is easy to breach
> in casual browsing of Wikimedia Commons categories, let alone with any
> kind of media-related bots.
>
> At the suggested speed, it would take over 150 years for a person to
> download Wikimedia Commons files alone.
>
> Needless to say, I breached such a threshold all the time when I
> compiled the https://archive.org/details/wikimediacommons collection. I
> typically aimed to saturate my upload bandwidth at all times when
> updating it, so I must have tried to download at about 100 Mbps, and it
> still took me months. (I used to run those scripts from my home in
> Milan, downloading the files to an external HDD. I stopped updating the
> collection after 2016 in part because I don't have FTTH in Helsinki, and
> the daily downloads were far too big for any storage in Wikimedia Cloud.)
>
> I appreciate that some exceptions for Wikimedia Cloud bots were added
> after the discussion at
> https://phabricator.wikimedia.org/T391020#10716478 , but the fact
> remains that this comes off as a big change.
>
> Il 09/04/25 19:10, AntiCompositeNumber ha scritto:
>  > I'll just note that both API:Etiquette and the Robot Policy have been
> incorporated by reference into the Terms of Use:
> https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use/en#12._API_Terms
>  >
>  > Undiscussed changes to the Terms of Use should be avoided.
>
> This is a good point.
>
> There are parts of the terms of use which assume the [[m:Right to fork]]
> is upheld by the availability of mirrored dumps. But the media tarballs
> have not been updated since 2012. Now in effect the WMF is explicitly
> saying that no mirrors are allowed for media, unless by gracious
> exemption to individual requesters.
>
> Best,
>         Federico
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to