Hi all,

I’ve  updated our Robot Policy[0], which was vastly outdated, the main
revision being from 2009.

The new policy isn’t more restrictive than the older one for general
crawling of the site or the API; on the contrary we allow higher limits
than previously stated. But the new policy clarifies a few points and adds
quite a few systems not covered in the old policy… because they didn’t
exist at the time.

My intention is to keep this page relevant, one that we update along as our
infrastructure evolves, trying to direct more and more web spiders and
high-volume scrapers to use the patterns and reduce their impact on the
infrastructure.

This update is a part of a coordinated effort[1] to try to guarantee fairer
use of our very limited hardware resources to our technical community and
users, so we will progressively start enforcing these rules for
non-community users[2] that currently violate these guidelines copiously.

If you have suggestions on how to improve the policy, please use the talk
page to provide feedback.

Cheers,

Giuseppe


[0] https://wikitech.wikimedia.org/wiki/Robot_policy

[1] See the draft of the annual plan objective here: https://w.wiki/DkD4

[2] While the general guidelines of the policy apply to any user, the goal
is not to place restrictions on our community, or any other
research/community crawler whose behaviour is in line with the
aforementioned guidelines. In fact, any bot running in toolsforge or cloud
VPS is already part of an allow-list that excludes this traffic from rate
limiting we apply at the CDN.

-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to