On Thu, Apr 24, 2025 at 3:16 PM MusikAnimal <musikani...@gmail.com> wrote: > > Note that this exercise of IP range whack-a-mole is nothing new to VPS tools. > I maintain two VPS projects (XTools, WS Export) that constantly suffer from > aggressive web crawlers and disruptive automation. We've been doing the > manual IP block thing for years :(
An interesting aspect of both of those Cloud VPS projects is that they are directly linked to from a number of content wikis. I think this greatly extends their exposure to crawler traffic in general. > I suggest the IP denylist be applied to all of WMCS > <https://phabricator.wikimedia.org/T226688>. We're able to get by for XTools > and WS Export because XFF headers were specially enabled for this > counter-abuse purpose. However most VPS tools and all of Toolforge don't have > such luxury. If there are bots pounding away, there's no means to stop them > currently (unless they are good bots with an identifiable UA). Even if we > could detect them, it seems better to reduce the repetitive effort and give > all of WMCS the same treatment. You are talking about three completely separate HTTP edges at this point. They all live on the same core Cloud VPS infrastructure, but there is no common HTTPS connection between the *.toolforge.org proxy, the *.wmcloud.org proxy, and the Beta Cluster CDN. The first two share some nginx stack configuration, but in practice are very different deployments with independent public IP addresses. The third is fundamentally a partial clone of the production wiki's CDN edge although scaled down and missing some newer components that nobody has yet done the work to introduce. > I'll also note that some farms of web crawlers can't feasibly be blocked > whack-a-mole style. This is the situation we're currently dealing with over > at <https://phabricator.wikimedia.org/T384711#10759017>. Truly distributed attack patterns (bot net traffic) are really hard to defend against with just an Apache2 instance. This is actually a place where someone could try experimenting with some filtering proxy like Anubis [0], go-away [1], or openappsec [2]. Having some experience with these tools could then lead us into better discussions about deploying them more widely or making them easier to use in targeted projects. [0]: https://anubis.techaro.lol/ [1]: https://git.gammaspectra.live/git/go-away [2]: https://github.com/openappsec/openappsec Bryan -- Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/