Re: Apache environment variables - logical AND
On Wed, Nov 05, 2008 at 08:24:16PM +1100, Ian Smith wrote: > On Tue, 4 Nov 2008, Jeremy Chadwick wrote: > > On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote: > > > I know this isn't FreeBSD specific - but I am, so crave your indulgence. > > > > > > Running Apache 1.3.27, using a fairly extensive access.conf to beat off > > > the most rapacious robots and such, using mostly BrowserMatch[NoCase] > > > and SetEnvIf to moderate access to several virtual hosts. No problem. > > > > > > OR conditions are of course straighforward: > > > > > > SetEnvIf somevar > > > SetEnvIf somevar > > > SetEnvIf !somevar > > > > > > What I can't figure out is how to set a variable3 if and only if both > > > variable1 AND variable2 are set. Eg: > > > > > > SetEnvIf Referer "^$" no_referer > > > SetEnvIf User-Agent "^$" no_browser > > > > > > I want the equivalent for this (invalid and totally fanciful) match: > > > > > > SetEnvIf (no_browser AND no_referer) go_away > > > > Sounds like a job for mod_rewrite. The SetEnvIf stuff is such a hack. > > It may be a hack, but I've found it an extremely useful one so far. > > > This is what we use on our production servers (snipped to keep it > > short): > > > > RewriteEngine on > > RewriteCond %{HTTP_REFERER} ^: [OR] > > RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/ [OR] > > RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR] > > RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR] > > RewriteCond %{HTTP_USER_AGENT} ^asterias[OR] > > RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR] > > RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [NC,OR] > > RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR] > > RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider > > RewriteRule ^.* - [F,L] > > > > You need to keep something in mind however: blocking by user agent is > > basically worthless these days. Most "leeching" tools now let you > > spoof the user agent to show up as Internet Explorer, essentially > > defeating the checks. > > While that's true, I've found most of the more troublesome robots are > too proud of their 'brand' to spoof user agent, and those that do are a) > often consistent enough in their Remote_Addr to exclude by subnet and/or > b) often make obvious errors in spoofed User_Agent strings .. especially > those pretending to be some variant of MSIE :) I haven't found this to be true at all, and I've been doing web hosting since 1993. In the past 2-3 years, the amount of leeching tools which spoof their User-Agent has increased dramatically. But step back for a moment and look at it from a usability perspective, because this is what really happens. A user tries to leech a site you host, using FruitBatLeecher, which your Apache server blocks based on User-Agent. The user has no idea why the leech program doesn't work. Does the user simply give up his quest? Absolutely not -- the user then goes and finds BobsBandwidthZilla which pretends to be Internet Explorer, Firefox, or lynx, and downloads the site. Now, if you're trying to block robots/scrapers which aren't honouring robots.txt, oh yes, that almost always works, because those rarely spoof their User-Agent (I think to date I've only seen one site which did that, and it was some Russian search engine). If you feel I'm just doing burn-outs arguing, a la "BSD style", let me give you some insight to how often I deal with this problem: daily. We host a very specific/niche site that contains over 20 years of technical information on the Famicom / Nintendo Entertainment System. The site has hundreds of megabytes of information, and a very active forum. Some jackass comes along and decides "Wow, this has all the info I want!" and fires off a leeching program against the entire domain/vhost. Let's say the program he's using is blocked by our User-Agent blocks; there is a 6-7 minute delay as the user goes off to find another program to leech with, installs it, and attempts it again. Pow, it works, and we find nice huge spikes in our logs for the vhost indicating someone got around it. I later dig through our access_log and find that he tried to use FruitBatLeecher, which got blocked, but then 6-7 minutes later came back with a leeching client that spoofs itself as IE. And it gets worse. Many of these leeching programs get stuck in infinite loops when it comes to forum software, so they sit there pounding on the webserver indefinitely. It requires administrator intervention to stop it; in my case, I don't even bother with Apache ACLs, because ~70% of the time the client ignores 403s and keeps bashing away (yes really!) -- I go straight for a pf-based block in a table called . These guys will hit that block for *days* -- that should give you some idea how long they'll let that program run. But it gets worse -- again. Recently, I found t
Re: Apache environment variables - logical AND
On Tue, 4 Nov 2008, Jeremy Chadwick wrote: > On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote: > > I know this isn't FreeBSD specific - but I am, so crave your indulgence. > > > > Running Apache 1.3.27, using a fairly extensive access.conf to beat off > > the most rapacious robots and such, using mostly BrowserMatch[NoCase] > > and SetEnvIf to moderate access to several virtual hosts. No problem. > > > > OR conditions are of course straighforward: > > > > SetEnvIf somevar > > SetEnvIf somevar > > SetEnvIf !somevar > > > > What I can't figure out is how to set a variable3 if and only if both > > variable1 AND variable2 are set. Eg: > > > > SetEnvIf Referer "^$" no_referer > > SetEnvIf User-Agent "^$" no_browser > > > > I want the equivalent for this (invalid and totally fanciful) match: > > > > SetEnvIf (no_browser AND no_referer) go_away > > Sounds like a job for mod_rewrite. The SetEnvIf stuff is such a hack. It may be a hack, but I've found it an extremely useful one so far. > This is what we use on our production servers (snipped to keep it > short): > > RewriteEngine on > RewriteCond %{HTTP_REFERER} ^: [OR] > RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/ [OR] > RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR] > RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR] > RewriteCond %{HTTP_USER_AGENT} ^asterias[OR] > RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR] > RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [NC,OR] > RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR] > RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider > RewriteRule ^.* - [F,L] > > You need to keep something in mind however: blocking by user agent is > basically worthless these days. Most "leeching" tools now let you > spoof the user agent to show up as Internet Explorer, essentially > defeating the checks. While that's true, I've found most of the more troublesome robots are too proud of their 'brand' to spoof user agent, and those that do are a) often consistent enough in their Remote_Addr to exclude by subnet and/or b) often make obvious errors in spoofed User_Agent strings .. especially those pretending to be some variant of MSIE :) > If you're that concerned about bandwidth (which is why a lot of people > do the above), consider rate-limiting. It's really, quite honestly, the > only method that is fail-safe. Thanks Jeremy. Certainly time to take the time to have another look at mod_rewrite, especially regarding redirection, alternative pages etc, but I still tend to glaze over about halfway through all that section. And unless I've completely missed it, your examples don't address my question, being how to AND two or more conditions in a particular test? If I really can't do this with mod_setenvif I'll have to take that time. cheers, Ian ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache environment variables - logical AND
On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote: > I know this isn't FreeBSD specific - but I am, so crave your indulgence. > > Running Apache 1.3.27, using a fairly extensive access.conf to beat off > the most rapacious robots and such, using mostly BrowserMatch[NoCase] > and SetEnvIf to moderate access to several virtual hosts. No problem. > > OR conditions are of course straighforward: > > SetEnvIf somevar > SetEnvIf somevar > SetEnvIf !somevar > > What I can't figure out is how to set a variable3 if and only if both > variable1 AND variable2 are set. Eg: > > SetEnvIf Referer "^$" no_referer > SetEnvIf User-Agent "^$" no_browser > > I want the equivalent for this (invalid and totally fanciful) match: > > SetEnvIf (no_browser AND no_referer) go_away Sounds like a job for mod_rewrite. The SetEnvIf stuff is such a hack. This is what we use on our production servers (snipped to keep it short): RewriteEngine on RewriteCond %{HTTP_REFERER} ^: [OR] RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/ [OR] RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR] RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR] RewriteCond %{HTTP_USER_AGENT} ^asterias[OR] RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR] RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider RewriteRule ^.* - [F,L] You need to keep something in mind however: blocking by user agent is basically worthless these days. Most "leeching" tools now let you spoof the user agent to show up as Internet Explorer, essentially defeating the checks. If you're that concerned about bandwidth (which is why a lot of people do the above), consider rate-limiting. It's really, quite honestly, the only method that is fail-safe. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"