On Thu, Apr 24, 2025 at 3:16 PM MusikAnimal <musikani...@gmail.com> wrote:
>
> Note that this exercise of IP range whack-a-mole is nothing new to VPS tools. 
> I maintain two VPS projects (XTools, WS Export) that constantly suffer from 
> aggressive web crawlers and disruptive automation. We've been doing the 
> manual IP block thing for years :(

An interesting aspect of both of those Cloud VPS projects is that they
are directly linked to from a number of content wikis. I think this
greatly extends their exposure to crawler traffic in general.

> I suggest the IP denylist be applied to all of WMCS 
> <https://phabricator.wikimedia.org/T226688>. We're able to get by for XTools 
> and WS Export because XFF headers were specially enabled for this 
> counter-abuse purpose. However most VPS tools and all of Toolforge don't have 
> such luxury. If there are bots pounding away, there's no means to stop them 
> currently (unless they are good bots with an identifiable UA). Even if we 
> could detect them, it seems better to reduce the repetitive effort and give 
> all of WMCS the same treatment.

You are talking about three completely separate HTTP edges at this
point. They all live on the same core Cloud VPS infrastructure, but
there is no common HTTPS connection between the *.toolforge.org proxy,
the *.wmcloud.org proxy, and the Beta Cluster CDN. The first two share
some nginx stack configuration, but in practice are very different
deployments with independent public IP addresses. The third is
fundamentally a partial clone of the production wiki's CDN edge
although scaled down and missing some newer components that nobody has
yet done the work to introduce.

> I'll also note that some farms of web crawlers can't feasibly be blocked 
> whack-a-mole style. This is the situation we're currently dealing with over 
> at <https://phabricator.wikimedia.org/T384711#10759017>.

Truly distributed attack patterns (bot net traffic) are really hard to
defend against with just an Apache2 instance. This is actually a place
where someone could try experimenting with some filtering proxy like
Anubis [0], go-away [1], or openappsec [2]. Having some experience
with these tools could then lead us into better discussions about
deploying them more widely or making them easier to use in targeted
projects.

[0]: https://anubis.techaro.lol/
[1]: https://git.gammaspectra.live/git/go-away
[2]: https://github.com/openappsec/openappsec

Bryan
-- 
Bryan Davis                                        Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to