Hi All, This has been successfully deployed in production, and the code (as-is) is handling "many thousands" of connections per second from fake and legitimate bots advertising themselves as Googlebot/Bingbot/etc with no apparent issues/problems. The configuration we've deployed is essentially the same as provided here (and in the code base).
Anyway, if anyone else ends up finding libvmod-dns helpful, please consider it "emailware" -- ie, drop me an email and let me know (off-the-record, of course) how you're making use of it. I'm curious more than anything! -Ken On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <[email protected]> wrote: > Hi, > > I spent a bit of time today developing a DNS module for Varnish. > > It is available here: > > https://github.com/kenshaw/libvmod-dns/ > > The reason for this development is to cut off bots that abuse the > User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a > reverse and then forward DNS against the client.ip/X-Forwarded-For header > and comparing with a regex against the resultant domain. > > The logic is meant to work something like this: > > sub vcl_recv { > # do a dns check on "good" crawlers > if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") { > # do a reverse lookup on the client.ip (X-Forwarded-For) and check > that its in the allowed domains > set req.http.X-Crawler-DNS-Reverse = > dns.rresolve(req.http.X-Forwarded-For); > > # check that the RDNS points to an allowed domain -- 403 error if > it doesn't > if (req.http.X-Crawler-DNS-Reverse !~ > "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") { > error 403 "Forbidden"; > } > > # do a forward lookup on the DNS > set req.http.X-Crawler-DNS-Forward = > dns.resolve(req.http.X-Crawler-DNS-Reverse); > > # if the client.ip/X-Forwarded-For doesn't match, then the > user-agent is fake > if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) { > error 403 "Forbidden"; > } > } > } > > While this is not being used in production (yet), I plan to do so later > this week against a production system receiving ~10,000+ requests/sec. I > will report back afterwards. > > I realize the code currently has issues (memory, documentation, etc.), > which will be fixed in the near future. > > I also realize there are better ways to head malicious bots off at the > pass through DNS, etc (which we are doing as well). The largest issue here > for my purposes is that it is difficult / impossible to identify all > traffic. Additionally, it is nice to be able to monitor the actual traffic > coming through and not completely dropping it at the edge. > > Any input/comments against what I've written so far would be gladly > appreciated! Thanks! > > -Ken >
_______________________________________________ varnish-dev mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
