Hi Giuseppe,

 

Thanks for updating the robots policy. I do see some overlap between 
https://wikitech.wikimedia.org/wiki/Robot_policy#Action_API_rules_(i.e._https://en.wikipedia.org/w/api.php?%E2%80%A6_)
 and https://www.mediawiki.org/wiki/API:Etiquette, so it may be worth thinking 
about if one or both of those pages needs an update to keep everything in sync. 
For example API Etiquette doesn’t link to the Robot Policy.

 

Speaking anecdotally, I didn’t know the Robot Policy existed and I assumed API 
Etiquette was the canonical page for this kind of thing.

 

Hope this helps.

 

Novem Linguae

 <mailto:novemling...@gmail.com> novemling...@gmail.com

 

From: Giuseppe Lavagetto <glavage...@wikimedia.org> 
Sent: Tuesday, April 8, 2025 8:08 AM
To: Wikimedia developers <wikitech-l@lists.wikimedia.org>
Subject: [Wikitech-l] Updates to the Robot policy

 

Hi all,

 

I’ve  updated our Robot Policy[0], which was vastly outdated, the main revision 
being from 2009. 

The new policy isn’t more restrictive than the older one for general crawling 
of the site or the API; on the contrary we allow higher limits than previously 
stated. But the new policy clarifies a few points and adds quite a few systems 
not covered in the old policy… because they didn’t exist at the time.

 

My intention is to keep this page relevant, one that we update along as our 
infrastructure evolves, trying to direct more and more web spiders and 
high-volume scrapers to use the patterns and reduce their impact on the 
infrastructure. 

 

This update is a part of a coordinated effort[1] to try to guarantee fairer use 
of our very limited hardware resources to our technical community and users, so 
we will progressively start enforcing these rules for non-community users[2] 
that currently violate these guidelines copiously.

 

If you have suggestions on how to improve the policy, please use the talk page 
to provide feedback.

 

Cheers,

 

Giuseppe

 

[0]  <https://wikitech.wikimedia.org/wiki/Robot_policy> 
https://wikitech.wikimedia.org/wiki/Robot_policy

[1] See the draft of the annual plan objective here: https://w.wiki/DkD4

[2] While the general guidelines of the policy apply to any user, the goal is 
not to place restrictions on our community, or any other research/community 
crawler whose behaviour is in line with the aforementioned guidelines. In fact, 
any bot running in toolsforge or cloud VPS is already part of an allow-list 
that excludes this traffic from rate limiting we apply at the CDN.

 

-- 

Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to