On 2/17/26 12:18, Chris Angelico via pydotorg-www wrote:
On Wed, 18 Feb 2026 at 06:13, Marc-Andre Lemburg <[email protected]> wrote:
Unfortunately, there's not a lot we can do against bots hitting the wiki.
Yes, but also there ARE *some* things we could do.
Esp. AI crawlers have become the #1 "users" of the wiki in the past few
months and those don't stick to any rules you give them. They also use
multiple IP addresses, so it feels a bit like a DDoS.
I don't think AI crawlers are a bad thing, but it really doesn't help if
they bring down systems.
Agreed, so that would mean we'd need to rate-limit those requests in
some way. I'm sure we're not the first to run into this problem,
surely. This has to be a known issue.
Absolutely... people are fighting that everywhere.
"I don't think AI crawlers are a bad thing" - in theory, no, but for
some reason some of them scrape in a loop. When they brought down
scons.org it was thousands of hits a minute, of the same pages, from the
same range of IPs in China that associate with one of the known AI firms
that deploys scraper bots. Why do you need to fetch the same thing
repeatedly? Nobody seems to be able to answer that.
Most solutions come down to sending requests through a proxy which can
reject some. Cloudflare is obviously popular. Various open source
projects use Anubis, but there's some setup involved with that (and
presumably ongoing maintenance), and we were already talking about being
resource-constrained as far as python.org administration.
I'll put in my own vote for keeping a wiki-like interface, but then I'm
definitely old-fashioned about tech so take for what it's worth.
_______________________________________________
pydotorg-www mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/pydotorg-www.python.org
Member address: [email protected]