Hi Giuseppe,
Thanks for updating the robots policy. I do see some overlap between https://wikitech.wikimedia.org/wiki/Robot_policy#Action_API_rules_(i.e._https://en.wikipedia.org/w/api.php?%E2%80%A6_) and https://www.mediawiki.org/wiki/API:Etiquette, so it may be worth thinking about if one or both of those pages needs an update to keep everything in sync. For example API Etiquette doesn’t link to the Robot Policy. Speaking anecdotally, I didn’t know the Robot Policy existed and I assumed API Etiquette was the canonical page for this kind of thing. Hope this helps. Novem Linguae <mailto:novemling...@gmail.com> novemling...@gmail.com From: Giuseppe Lavagetto <glavage...@wikimedia.org> Sent: Tuesday, April 8, 2025 8:08 AM To: Wikimedia developers <wikitech-l@lists.wikimedia.org> Subject: [Wikitech-l] Updates to the Robot policy Hi all, I’ve updated our Robot Policy[0], which was vastly outdated, the main revision being from 2009. The new policy isn’t more restrictive than the older one for general crawling of the site or the API; on the contrary we allow higher limits than previously stated. But the new policy clarifies a few points and adds quite a few systems not covered in the old policy… because they didn’t exist at the time. My intention is to keep this page relevant, one that we update along as our infrastructure evolves, trying to direct more and more web spiders and high-volume scrapers to use the patterns and reduce their impact on the infrastructure. This update is a part of a coordinated effort[1] to try to guarantee fairer use of our very limited hardware resources to our technical community and users, so we will progressively start enforcing these rules for non-community users[2] that currently violate these guidelines copiously. If you have suggestions on how to improve the policy, please use the talk page to provide feedback. Cheers, Giuseppe [0] <https://wikitech.wikimedia.org/wiki/Robot_policy> https://wikitech.wikimedia.org/wiki/Robot_policy [1] See the draft of the annual plan objective here: https://w.wiki/DkD4 [2] While the general guidelines of the policy apply to any user, the goal is not to place restrictions on our community, or any other research/community crawler whose behaviour is in line with the aforementioned guidelines. In fact, any bot running in toolsforge or cloud VPS is already part of an allow-list that excludes this traffic from rate limiting we apply at the CDN. -- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/