On Tue, Apr 15, 2025 at 2:27 PM Bryan Davis <bd...@wikimedia.org> wrote:
>
> I just wanted to give folks a heads up that in response to a few
> traffic storms in the Beta Cluster (deployment-prep CLoud VPS project)
> we have started using the very coarse protection of blocking IP
> ranges. These blocks are being applied at the Beta Cluster CDN edge
> where we have Varnish configuration that can discard traffic based on
> a list of CIDR ranges.
>
> The ranges blocked at any point in time should be visible in the
> deployment-prep project's Hiera configuration that is logged in the
> cloud/instance-puppet.git repo. [0]
>
> The hardly scientific process of choosing what to block so far has
> been done with processes like the one documented at
> https://phabricator.wikimedia.org/T392003. Hashar came up with a shell
> one-liner to count requests by IP address or IP address prefix
> depending on the regex provided. We then take the top addresses
> produced by that log filtering and perform a `whois` lookup to find
> the associated IP address allocation. The CIDR blocks associated with
> the allocation are then put into hiera config, a Puppet run is forced,
> and Varnish is restarted. Repeat as necessary to get to a reasonable
> rate of requests passing through Varnish to the backing MediaWiki
> instances where we are examining the logs.

A week goes by and we find ourselves back in the same "beta crushed by
bot traffic" place again. [2] I tried blocking selectively at first
[3], but I was not making much progress in lowering the load. After
noticing that a lot of the traffic was coming from ranges assigned to
orgs in Brazil I tried blocking a lot of Class B networks (X.Y.0.0/16)
that were on https://ipnetinfo.com/country/BR and showing traffic in
the logs. [4] This helped a bit, but things were still looking pretty
bad.

I got frustrated and decided to see if blocking Class A networks
(X.0.0.0/8) would do anything. I wrote a delightfully horrible script
that buckets the last 50,000 requests by Class A network and outputs a
cut-and-paste ready list of all of them with more than 500 requests.
[5] I blocked these IP ranges, waited to see what happened for a bit,
and repeated a few times.

This seems to have worked so far, but does not make me very happy. The
blocks are really wide and almost certain to sweep up legitimate
traffic sooner or later if we keep doing things this way. We have some
newer tools in use with the production networks that might make it
easier for us to rate limit aggressively at the edge rather than
applying outright blocks to large ranges.

> If you feel that you have legitimate traffic for the Beta Cluster to
> handle that has gotten swept up in one of these blocks, please reach
> out by filing task on the #beta-cluster-infrastructure Phabricator
> board. [1]
>
> If you think working to make this process of blocking easier or
> unnecessary sounds like a fun project I would love to chat more. Hit
> me up via email, libera.chat irc, or on-wiki with your ideas.
>
> [0]: 
> https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/heads/master/deployment-prep/_.yaml
> [1]: https://phabricator.wikimedia.org/tag/beta-cluster-infrastructure/

[2]: https://phabricator.wikimedia.org/T392534
[3]: https://phabricator.wikimedia.org/T392534#10763059
[4]: https://phabricator.wikimedia.org/T392534#10763134
[5]: https://phabricator.wikimedia.org/T392534#10763235

Bryan
-- 
Bryan Davis                                        Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to