Hi,

Thanks for pointing the confusion out. I didn't remember the picturesque
wording in the API:Etiquette page :D

But I think that page is more about what MediaWiki can do in terms of
rate-limiting, and indeed MediaWiki doesn't do rate-limiting on reads.

What the WMF infrastructure does, OTOH, is different. So maybe it's a good
idea to add a {{Note}} at the top of the etiquette page clarifying that
these are general MW-related rules, and that for the Foundation
infrastructure people should refer to the Robot policy.

Do you think that would help?

Cheers,

Giuseppe

On Tue, Apr 8, 2025 at 8:38 PM Novem Linguae <novemling...@gmail.com> wrote:

> Hi Giuseppe,
>
>
>
> Thanks for updating the robots policy. I do see some overlap between
> https://wikitech.wikimedia.org/wiki/Robot_policy#Action_API_rules_(i.e._https://en.wikipedia.org/w/api.php?%E2%80%A6_)
> and https://www.mediawiki.org/wiki/API:Etiquette, so it may be worth
> thinking about if one or both of those pages needs an update to keep
> everything in sync. For example API Etiquette doesn’t link to the Robot
> Policy.
>
>
>
> Speaking anecdotally, I didn’t know the Robot Policy existed and I assumed
> API Etiquette was the canonical page for this kind of thing.
>
>
>
> Hope this helps.
>
>
>
> *Novem Linguae*
>
> novemling...@gmail.com
>
>
>
> *From:* Giuseppe Lavagetto <glavage...@wikimedia.org>
> *Sent:* Tuesday, April 8, 2025 8:08 AM
> *To:* Wikimedia developers <wikitech-l@lists.wikimedia.org>
> *Subject:* [Wikitech-l] Updates to the Robot policy
>
>
>
> Hi all,
>
>
>
> I’ve  updated our Robot Policy[0], which was vastly outdated, the main
> revision being from 2009.
>
> The new policy isn’t more restrictive than the older one for general
> crawling of the site or the API; on the contrary we allow higher limits
> than previously stated. But the new policy clarifies a few points and adds
> quite a few systems not covered in the old policy… because they didn’t
> exist at the time.
>
>
>
> My intention is to keep this page relevant, one that we update along as
> our infrastructure evolves, trying to direct more and more web spiders and
> high-volume scrapers to use the patterns and reduce their impact on the
> infrastructure.
>
>
>
> This update is a part of a coordinated effort[1] to try to guarantee
> fairer use of our very limited hardware resources to our technical
> community and users, so we will progressively start enforcing these rules
> for non-community users[2] that currently violate these guidelines
> copiously.
>
>
>
> If you have suggestions on how to improve the policy, please use the talk
> page to provide feedback.
>
>
>
> Cheers,
>
>
>
> Giuseppe
>
>
>
> [0] https://wikitech.wikimedia.org/wiki/Robot_policy
>
> [1] See the draft of the annual plan objective here: https://w.wiki/DkD4
>
> [2] While the general guidelines of the policy apply to any user, the goal
> is not to place restrictions on our community, or any other
> research/community crawler whose behaviour is in line with the
> aforementioned guidelines. In fact, any bot running in toolsforge or cloud
> VPS is already part of an allow-list that excludes this traffic from rate
> limiting we apply at the CDN.
>
>
>
> --
>
> Giuseppe Lavagetto
> Principal Site Reliability Engineer, Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>


-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to