Hi all, I’ve updated our Robot Policy[0], which was vastly outdated, the main revision being from 2009.
The new policy isn’t more restrictive than the older one for general crawling of the site or the API; on the contrary we allow higher limits than previously stated. But the new policy clarifies a few points and adds quite a few systems not covered in the old policy… because they didn’t exist at the time. My intention is to keep this page relevant, one that we update along as our infrastructure evolves, trying to direct more and more web spiders and high-volume scrapers to use the patterns and reduce their impact on the infrastructure. This update is a part of a coordinated effort[1] to try to guarantee fairer use of our very limited hardware resources to our technical community and users, so we will progressively start enforcing these rules for non-community users[2] that currently violate these guidelines copiously. If you have suggestions on how to improve the policy, please use the talk page to provide feedback. Cheers, Giuseppe [0] https://wikitech.wikimedia.org/wiki/Robot_policy [1] See the draft of the annual plan objective here: https://w.wiki/DkD4 [2] While the general guidelines of the policy apply to any user, the goal is not to place restrictions on our community, or any other research/community crawler whose behaviour is in line with the aforementioned guidelines. In fact, any bot running in toolsforge or cloud VPS is already part of an allow-list that excludes this traffic from rate limiting we apply at the CDN. -- Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/