Re: Apache environment variables - logical AND

2008-11-05 Thread Jeremy Chadwick
On Wed, Nov 05, 2008 at 08:24:16PM +1100, Ian Smith wrote:
> On Tue, 4 Nov 2008, Jeremy Chadwick wrote:
>  > On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote:
>  > > I know this isn't FreeBSD specific - but I am, so crave your indulgence.
>  > > 
>  > > Running Apache 1.3.27, using a fairly extensive access.conf to beat off 
>  > > the most rapacious robots and such, using mostly BrowserMatch[NoCase] 
>  > > and SetEnvIf to moderate access to several virtual hosts.  No problem.
>  > > 
>  > > OR conditions are of course straighforward:
>  > > 
>  > >   SetEnvIf  somevar
>  > >   SetEnvIf  somevar
>  > >   SetEnvIf  !somevar
>  > > 
>  > > What I can't figure out is how to set a variable3 if and only if both 
>  > > variable1 AND variable2 are set.  Eg:
>  > > 
>  > >   SetEnvIf Referer "^$" no_referer
>  > >   SetEnvIf User-Agent "^$" no_browser
>  > > 
>  > > I want the equivalent for this (invalid and totally fanciful) match: 
>  > > 
>  > >   SetEnvIf (no_browser AND no_referer) go_away
>  > 
>  > Sounds like a job for mod_rewrite.  The SetEnvIf stuff is such a hack.
> 
> It may be a hack, but I've found it an extremely useful one so far.
>
>  > This is what we use on our production servers (snipped to keep it
>  > short):
>  > 
>  > RewriteEngine on
>  > RewriteCond %{HTTP_REFERER} ^:  [OR]
>  > RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/  [OR]
>  > RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^asterias[OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^Black.Hole  [NC,OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR]
>  > RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider
>  > RewriteRule ^.* - [F,L]
>  > 
>  > You need to keep something in mind however: blocking by user agent is
>  > basically worthless these days.  Most "leeching" tools now let you
>  > spoof the user agent to show up as Internet Explorer, essentially
>  > defeating the checks.
> 
> While that's true, I've found most of the more troublesome robots are 
> too proud of their 'brand' to spoof user agent, and those that do are a) 
> often consistent enough in their Remote_Addr to exclude by subnet and/or 
> b) often make obvious errors in spoofed User_Agent strings .. especially 
> those pretending to be some variant of MSIE :)

I haven't found this to be true at all, and I've been doing web hosting
since 1993.  In the past 2-3 years, the amount of leeching tools which
spoof their User-Agent has increased dramatically.

But step back for a moment and look at it from a usability perspective,
because this is what really happens.

A user tries to leech a site you host, using FruitBatLeecher, which your
Apache server blocks based on User-Agent.  The user has no idea why the
leech program doesn't work.  Does the user simply give up his quest?
Absolutely not -- the user then goes and finds BobsBandwidthZilla which
pretends to be Internet Explorer, Firefox, or lynx, and downloads the
site.

Now, if you're trying to block robots/scrapers which aren't honouring
robots.txt, oh yes, that almost always works, because those rarely spoof
their User-Agent (I think to date I've only seen one site which did
that, and it was some Russian search engine).

If you feel I'm just doing burn-outs arguing, a la "BSD style", let me
give you some insight to how often I deal with this problem: daily.

We host a very specific/niche site that contains over 20 years of
technical information on the Famicom / Nintendo Entertainment System.
The site has hundreds of megabytes of information, and a very active
forum.  Some jackass comes along and decides "Wow, this has all the info
I want!" and fires off a leeching program against the entire
domain/vhost.  Let's say the program he's using is blocked by our
User-Agent blocks; there is a 6-7 minute delay as the user goes off to
find another program to leech with, installs it, and attempts it again.
Pow, it works, and we find nice huge spikes in our logs for the vhost
indicating someone got around it.  I later dig through our access_log and
find that he tried to use FruitBatLeecher, which got blocked, but then
6-7 minutes later came back with a leeching client that spoofs itself
as IE.

And it gets worse.

Many of these leeching programs get stuck in infinite loops when it
comes to forum software, so they sit there pounding on the webserver
indefinitely.  It requires administrator intervention to stop it; in my
case, I don't even bother with Apache ACLs, because ~70% of the time
the client ignores 403s and keeps bashing away (yes really!) -- I go
straight for a pf-based block in a table called .  These
guys will hit that block for *days* -- that should give you some idea
how long they'll let that program run.

But it gets worse -- again.

Recently, I found t

Re: Apache environment variables - logical AND

2008-11-05 Thread Ian Smith
On Tue, 4 Nov 2008, Jeremy Chadwick wrote:
 > On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote:
 > > I know this isn't FreeBSD specific - but I am, so crave your indulgence.
 > > 
 > > Running Apache 1.3.27, using a fairly extensive access.conf to beat off 
 > > the most rapacious robots and such, using mostly BrowserMatch[NoCase] 
 > > and SetEnvIf to moderate access to several virtual hosts.  No problem.
 > > 
 > > OR conditions are of course straighforward:
 > > 
 > >   SetEnvIf  somevar
 > >   SetEnvIf  somevar
 > >   SetEnvIf  !somevar
 > > 
 > > What I can't figure out is how to set a variable3 if and only if both 
 > > variable1 AND variable2 are set.  Eg:
 > > 
 > >   SetEnvIf Referer "^$" no_referer
 > >   SetEnvIf User-Agent "^$" no_browser
 > > 
 > > I want the equivalent for this (invalid and totally fanciful) match: 
 > > 
 > >   SetEnvIf (no_browser AND no_referer) go_away
 > 
 > Sounds like a job for mod_rewrite.  The SetEnvIf stuff is such a hack.

It may be a hack, but I've found it an extremely useful one so far.

 > This is what we use on our production servers (snipped to keep it
 > short):
 > 
 > RewriteEngine on
 > RewriteCond %{HTTP_REFERER} ^:  [OR]
 > RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/  [OR]
 > RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR]
 > RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR]
 > RewriteCond %{HTTP_USER_AGENT} ^asterias[OR]
 > RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
 > RewriteCond %{HTTP_USER_AGENT} ^Black.Hole  [NC,OR]
 > RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR]
 > RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider
 > RewriteRule ^.* - [F,L]
 > 
 > You need to keep something in mind however: blocking by user agent is
 > basically worthless these days.  Most "leeching" tools now let you
 > spoof the user agent to show up as Internet Explorer, essentially
 > defeating the checks.

While that's true, I've found most of the more troublesome robots are 
too proud of their 'brand' to spoof user agent, and those that do are a) 
often consistent enough in their Remote_Addr to exclude by subnet and/or 
b) often make obvious errors in spoofed User_Agent strings .. especially 
those pretending to be some variant of MSIE :)

 > If you're that concerned about bandwidth (which is why a lot of people
 > do the above), consider rate-limiting.  It's really, quite honestly, the
 > only method that is fail-safe.

Thanks Jeremy.  Certainly time to take the time to have another look at 
mod_rewrite, especially regarding redirection, alternative pages etc, 
but I still tend to glaze over about halfway through all that section.

And unless I've completely missed it, your examples don't address my 
question, being how to AND two or more conditions in a particular test?

If I really can't do this with mod_setenvif I'll have to take that time.

cheers, Ian
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Apache environment variables - logical AND

2008-11-04 Thread Jeremy Chadwick
On Wed, Nov 05, 2008 at 05:33:45PM +1100, Ian Smith wrote:
> I know this isn't FreeBSD specific - but I am, so crave your indulgence.
> 
> Running Apache 1.3.27, using a fairly extensive access.conf to beat off 
> the most rapacious robots and such, using mostly BrowserMatch[NoCase] 
> and SetEnvIf to moderate access to several virtual hosts.  No problem.
> 
> OR conditions are of course straighforward:
> 
>   SetEnvIf  somevar
>   SetEnvIf  somevar
>   SetEnvIf  !somevar
> 
> What I can't figure out is how to set a variable3 if and only if both 
> variable1 AND variable2 are set.  Eg:
> 
>   SetEnvIf Referer "^$" no_referer
>   SetEnvIf User-Agent "^$" no_browser
> 
> I want the equivalent for this (invalid and totally fanciful) match: 
> 
>   SetEnvIf (no_browser AND no_referer) go_away

Sounds like a job for mod_rewrite.  The SetEnvIf stuff is such a hack.

This is what we use on our production servers (snipped to keep it
short):

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^:  [OR]
RewriteCond %{HTTP_REFERER} ^http://forums.somethingawful.com/  [OR]
RewriteCond %{HTTP_REFERER} ^http://forums.fark.com/[OR]
RewriteCond %{HTTP_USER_AGENT} ^Alexibot[OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias[OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black.Hole  [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE[OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon.WebSpider
RewriteRule ^.* - [F,L]

You need to keep something in mind however: blocking by user agent is
basically worthless these days.  Most "leeching" tools now let you
spoof the user agent to show up as Internet Explorer, essentially
defeating the checks.

If you're that concerned about bandwidth (which is why a lot of people
do the above), consider rate-limiting.  It's really, quite honestly, the
only method that is fail-safe.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"