On Tue, Apr 15, 2025 at 2:27 PM Bryan Davis <bd...@wikimedia.org> wrote: > > I just wanted to give folks a heads up that in response to a few > traffic storms in the Beta Cluster (deployment-prep CLoud VPS project) > we have started using the very coarse protection of blocking IP > ranges. These blocks are being applied at the Beta Cluster CDN edge > where we have Varnish configuration that can discard traffic based on > a list of CIDR ranges. > > The ranges blocked at any point in time should be visible in the > deployment-prep project's Hiera configuration that is logged in the > cloud/instance-puppet.git repo. [0] > > The hardly scientific process of choosing what to block so far has > been done with processes like the one documented at > https://phabricator.wikimedia.org/T392003. Hashar came up with a shell > one-liner to count requests by IP address or IP address prefix > depending on the regex provided. We then take the top addresses > produced by that log filtering and perform a `whois` lookup to find > the associated IP address allocation. The CIDR blocks associated with > the allocation are then put into hiera config, a Puppet run is forced, > and Varnish is restarted. Repeat as necessary to get to a reasonable > rate of requests passing through Varnish to the backing MediaWiki > instances where we are examining the logs.
A week goes by and we find ourselves back in the same "beta crushed by bot traffic" place again. [2] I tried blocking selectively at first [3], but I was not making much progress in lowering the load. After noticing that a lot of the traffic was coming from ranges assigned to orgs in Brazil I tried blocking a lot of Class B networks (X.Y.0.0/16) that were on https://ipnetinfo.com/country/BR and showing traffic in the logs. [4] This helped a bit, but things were still looking pretty bad. I got frustrated and decided to see if blocking Class A networks (X.0.0.0/8) would do anything. I wrote a delightfully horrible script that buckets the last 50,000 requests by Class A network and outputs a cut-and-paste ready list of all of them with more than 500 requests. [5] I blocked these IP ranges, waited to see what happened for a bit, and repeated a few times. This seems to have worked so far, but does not make me very happy. The blocks are really wide and almost certain to sweep up legitimate traffic sooner or later if we keep doing things this way. We have some newer tools in use with the production networks that might make it easier for us to rate limit aggressively at the edge rather than applying outright blocks to large ranges. > If you feel that you have legitimate traffic for the Beta Cluster to > handle that has gotten swept up in one of these blocks, please reach > out by filing task on the #beta-cluster-infrastructure Phabricator > board. [1] > > If you think working to make this process of blocking easier or > unnecessary sounds like a fun project I would love to chat more. Hit > me up via email, libera.chat irc, or on-wiki with your ideas. > > [0]: > https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/heads/master/deployment-prep/_.yaml > [1]: https://phabricator.wikimedia.org/tag/beta-cluster-infrastructure/ [2]: https://phabricator.wikimedia.org/T392534 [3]: https://phabricator.wikimedia.org/T392534#10763059 [4]: https://phabricator.wikimedia.org/T392534#10763134 [5]: https://phabricator.wikimedia.org/T392534#10763235 Bryan -- Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/