Hi, Thanks for pointing the confusion out. I didn't remember the picturesque wording in the API:Etiquette page :D
But I think that page is more about what MediaWiki can do in terms of rate-limiting, and indeed MediaWiki doesn't do rate-limiting on reads. What the WMF infrastructure does, OTOH, is different. So maybe it's a good idea to add a {{Note}} at the top of the etiquette page clarifying that these are general MW-related rules, and that for the Foundation infrastructure people should refer to the Robot policy. Do you think that would help? Cheers, Giuseppe On Tue, Apr 8, 2025 at 8:38 PM Novem Linguae <novemling...@gmail.com> wrote: > Hi Giuseppe, > > > > Thanks for updating the robots policy. I do see some overlap between > https://wikitech.wikimedia.org/wiki/Robot_policy#Action_API_rules_(i.e._https://en.wikipedia.org/w/api.php?%E2%80%A6_) > and https://www.mediawiki.org/wiki/API:Etiquette, so it may be worth > thinking about if one or both of those pages needs an update to keep > everything in sync. For example API Etiquette doesn’t link to the Robot > Policy. > > > > Speaking anecdotally, I didn’t know the Robot Policy existed and I assumed > API Etiquette was the canonical page for this kind of thing. > > > > Hope this helps. > > > > *Novem Linguae* > > novemling...@gmail.com > > > > *From:* Giuseppe Lavagetto <glavage...@wikimedia.org> > *Sent:* Tuesday, April 8, 2025 8:08 AM > *To:* Wikimedia developers <wikitech-l@lists.wikimedia.org> > *Subject:* [Wikitech-l] Updates to the Robot policy > > > > Hi all, > > > > I’ve updated our Robot Policy[0], which was vastly outdated, the main > revision being from 2009. > > The new policy isn’t more restrictive than the older one for general > crawling of the site or the API; on the contrary we allow higher limits > than previously stated. But the new policy clarifies a few points and adds > quite a few systems not covered in the old policy… because they didn’t > exist at the time. > > > > My intention is to keep this page relevant, one that we update along as > our infrastructure evolves, trying to direct more and more web spiders and > high-volume scrapers to use the patterns and reduce their impact on the > infrastructure. > > > > This update is a part of a coordinated effort[1] to try to guarantee > fairer use of our very limited hardware resources to our technical > community and users, so we will progressively start enforcing these rules > for non-community users[2] that currently violate these guidelines > copiously. > > > > If you have suggestions on how to improve the policy, please use the talk > page to provide feedback. > > > > Cheers, > > > > Giuseppe > > > > [0] https://wikitech.wikimedia.org/wiki/Robot_policy > > [1] See the draft of the annual plan objective here: https://w.wiki/DkD4 > > [2] While the general guidelines of the policy apply to any user, the goal > is not to place restrictions on our community, or any other > research/community crawler whose behaviour is in line with the > aforementioned guidelines. In fact, any bot running in toolsforge or cloud > VPS is already part of an allow-list that excludes this traffic from rate > limiting we apply at the CDN. > > > > -- > > Giuseppe Lavagetto > Principal Site Reliability Engineer, Wikimedia Foundation > _______________________________________________ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ > -- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/