>> I can't easily check -current, because HTTP access to cvsweb has
>> been broken; it now insists on trying to ram HTTPS down my throat.
> Side note: it is far worse than http vs. https, it uses www/anubis
> [...JavaScript worker threads...sha256...]

> Unfortunately this kind of drastic measures have become necessary to
> protect against clearly broken AI crawlers that do not respect the
> /robots.txt standard.

Curious.  I'm not getting swamped (which would be relatively easy to
do; I'm behind a fairly slow DSL link), even though I have, from an
HTTP client's point of view, *many* NetBSD source trees and assorted
other software available.

Is it...because I don't support HTTPS at all?  Because of my border
blacklist?  Because I'm using (slightly mutant) bozohttpd instead of
something commoner?  Because I'm exporting just the software, not
things like a UI to wandering around the tree?  Because they just
haven't noticed me yet?  (*Some* crawlers certainly have.)  I can't
help thinking that it might be worth trying other approaches.  For
example, if the logs indicate the ill-behaved crawlers stick to HTTPS
(which wouldn't surprise me), maybe do the anubis thing for HTTPS but
not for HTTP?  Those with the cycles for HTTPS are more likely to have
the cycles for JS and SHA256, I feel sure.

Personally, I'd be inclined to block the netblocks they're coming from,
with complaints to their abuse contacts, and those which don't respond,
or which support the misbehaviour, stay blocked; those which clean up
their act get unblocked.  But I don't know how well that matches up
with NetBSD's tradeoffs.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mo...@rodents-montreal.org
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Reply via email to