>> I can't easily check -current, because HTTP access to cvsweb has >> been broken; it now insists on trying to ram HTTPS down my throat. > Side note: it is far worse than http vs. https, it uses www/anubis > [...JavaScript worker threads...sha256...]
> Unfortunately this kind of drastic measures have become necessary to > protect against clearly broken AI crawlers that do not respect the > /robots.txt standard. Curious. I'm not getting swamped (which would be relatively easy to do; I'm behind a fairly slow DSL link), even though I have, from an HTTP client's point of view, *many* NetBSD source trees and assorted other software available. Is it...because I don't support HTTPS at all? Because of my border blacklist? Because I'm using (slightly mutant) bozohttpd instead of something commoner? Because I'm exporting just the software, not things like a UI to wandering around the tree? Because they just haven't noticed me yet? (*Some* crawlers certainly have.) I can't help thinking that it might be worth trying other approaches. For example, if the logs indicate the ill-behaved crawlers stick to HTTPS (which wouldn't surprise me), maybe do the anubis thing for HTTPS but not for HTTP? Those with the cycles for HTTPS are more likely to have the cycles for JS and SHA256, I feel sure. Personally, I'd be inclined to block the netblocks they're coming from, with complaints to their abuse contacts, and those which don't respond, or which support the misbehaviour, stay blocked; those which clean up their act get unblocked. But I don't know how well that matches up with NetBSD's tradeoffs. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B