Many thanks Giusppe. I think everyone is better off when these policies are
enforced, even the projects currently unwittingly ignoring those
guidelines. We could also dedicate a bit more of our overall budget to
hardware. 🦾

SJ

On Tue, Apr 8, 2025 at 11:09 AM Giuseppe Lavagetto <glavage...@wikimedia.org>
wrote:

> Hi all,
>
> I’ve  updated our Robot Policy[0], which was vastly outdated, the main
> revision being from 2009.
>
> The new policy isn’t more restrictive than the older one for general
> crawling of the site or the API; on the contrary we allow higher limits
> than previously stated. But the new policy clarifies a few points and adds
> quite a few systems not covered in the old policy… because they didn’t
> exist at the time.
>
> My intention is to keep this page relevant, one that we update along as
> our infrastructure evolves, trying to direct more and more web spiders and
> high-volume scrapers to use the patterns and reduce their impact on the
> infrastructure.
>
> This update is a part of a coordinated effort[1] to try to guarantee
> fairer use of our very limited hardware resources to our technical
> community and users, so we will progressively start enforcing these rules
> for non-community users[2] that currently violate these guidelines
> copiously.
>
> If you have suggestions on how to improve the policy, please use the talk
> page to provide feedback.
>
> Cheers,
>
> Giuseppe
>
>
> [0] https://wikitech.wikimedia.org/wiki/Robot_policy
>
> [1] See the draft of the annual plan objective here: https://w.wiki/DkD4
>
> [2] While the general guidelines of the policy apply to any user, the goal
> is not to place restrictions on our community, or any other
> research/community crawler whose behaviour is in line with the
> aforementioned guidelines. In fact, any bot running in toolsforge or cloud
> VPS is already part of an allow-list that excludes this traffic from rate
> limiting we apply at the CDN.
>
> --
> Giuseppe Lavagetto
> Principal Site Reliability Engineer, Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to