Many thanks Giusppe. I think everyone is better off when these policies are enforced, even the projects currently unwittingly ignoring those guidelines. We could also dedicate a bit more of our overall budget to hardware. 🦾
SJ On Tue, Apr 8, 2025 at 11:09 AM Giuseppe Lavagetto <glavage...@wikimedia.org> wrote: > Hi all, > > I’ve updated our Robot Policy[0], which was vastly outdated, the main > revision being from 2009. > > The new policy isn’t more restrictive than the older one for general > crawling of the site or the API; on the contrary we allow higher limits > than previously stated. But the new policy clarifies a few points and adds > quite a few systems not covered in the old policy… because they didn’t > exist at the time. > > My intention is to keep this page relevant, one that we update along as > our infrastructure evolves, trying to direct more and more web spiders and > high-volume scrapers to use the patterns and reduce their impact on the > infrastructure. > > This update is a part of a coordinated effort[1] to try to guarantee > fairer use of our very limited hardware resources to our technical > community and users, so we will progressively start enforcing these rules > for non-community users[2] that currently violate these guidelines > copiously. > > If you have suggestions on how to improve the policy, please use the talk > page to provide feedback. > > Cheers, > > Giuseppe > > > [0] https://wikitech.wikimedia.org/wiki/Robot_policy > > [1] See the draft of the annual plan objective here: https://w.wiki/DkD4 > > [2] While the general guidelines of the policy apply to any user, the goal > is not to place restrictions on our community, or any other > research/community crawler whose behaviour is in line with the > aforementioned guidelines. In fact, any bot running in toolsforge or cloud > VPS is already part of an allow-list that excludes this traffic from rate > limiting we apply at the CDN. > > -- > Giuseppe Lavagetto > Principal Site Reliability Engineer, Wikimedia Foundation > _______________________________________________ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ -- Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/