Re: Script help needed please

2003-08-14 Thread Alexander Haderer
At 08:49 14.08.2003 -0500, Jack L. Stone wrote:
...
When we started providing the articles 6-7 years ago, folks used browsers
to read the articles. Now, the trend has become a more lazy approach and
there is an increasing use of those download utilities which can be left
unattended to download entire web sites taking several hours to do so.
Multiply this by a number of similar downloads and there goes the
bandwidth, denying those other normal online readers the speed needed for
loading and browsing in the manner intended. Several hundred will be
reading at a time and several 1000 daily.

A possible solution?
What comes to my mind:

- Offer zip/tar.gz archives via an ftp server to your customers.
- allow customer's server to mirror your ftp-server
- probably: setup a mailing list to inform your customers about changes/updates
Of course you can additionally install some bandwith limitation stuff. (But 
I don't know one, sorry).

Alexander

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Script help needed please

2003-08-14 Thread Jack L. Stone
At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote:
On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
 Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
 The above is typical of the servers in use, and with csh shells employed,
 plus IPFW.
 
 My apologies for the length of this question, but the background seems
 necessary as brief as I can make it so the question makes sense.
 
 The problem:
 We have several servers that provide online reading of Technical articles
 and each have several hundred MB to a GB of content.
 
 When we started providing the articles 6-7 years ago, folks used browsers
 to read the articles. Now, the trend has become a more lazy approach and
 there is an increasing use of those download utilities which can be left
 unattended to download entire web sites taking several hours to do so.
 Multiply this by a number of similar downloads and there goes the
 bandwidth, denying those other normal online readers the speed needed for
 loading and browsing in the manner intended. Several hundred will be
 reading at a time and several 1000 daily.
snip
There is no easy solution to this, but one avenue might be to look at
bandwidth throttling in an apache module.

One that I've used before is mod_throttle which is in the ports:

/usr/ports/www/mod_throttle

which allows you to throttle users by ip address to a certain number of
documents and/or up to a certain transfer limit.  IIRC it's fairly
limited though in that you can only apply per IP limits to _every_
virtual host - ie in the global httpd.conf context.

A more finegrained solution (from what I've read, haven't tried it) is
mod_bwshare - this one isn't in the ports but can be found here:

http://www.topology.org/src/bwshare/

this module overcomes some of the shortfalls of mod_throttle and allows
you to specify finer granularity over who consumes how much bandwidth
over what time period.

 Now, my question: Is it possible to write a script that can constantly scan
 the Apache logs to look for certain footprints of those downloaders,
 perhaps the names, like HTTRACK, being one I see a lot. Whenever I see
 one of those sessions, I have been able to abort them by adding a rule to
 the firewall to deny the IP address access to the server. This aborts the
 downloading, but have seen the attempts constantly continue for a day or
 two, confirming unattended downloads.
 
 Thus, if the script could spot an offender and then perhaps make use of
 the firewall to add a rule containing the offender's IP address and then
 flush to reset the firewall, this would at least abort the download and
 free up the bandwidth (I already have a script that restarts the firewall).
 
 Is this possible and how would I go about it???
If you really wanted to go down this route then I found a script someone
wrote a while back to find 'rude robots' from a httpd logfile which you
could perhaps adapt to do dynamic filtering in conjunction with your
firewall:

http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html

If you have any success let me know.

-- 
Jez


Interesting. Looks like a step in the right direction. Will weigh this one
along the possibilities.

Many thanks...!

Best regards,
Jack L. Stone,
Administrator

SageOne Net
http://www.sage-one.net
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Script help needed please

2003-08-14 Thread Michael Conlen
Jack,

You can setup Apache to deny access to people using that browser. The 
catch is that it's easy to work around it by changing the browser 
string. If they are that desperate to do this after you deny access to 
people using HTTRACK or other clients you can place a link that no human 
would access that runs a CGI that runs the firewall rule to deny them 
access. You probably want it to return some data and wait a bit so the 
user can't figure out easily what URL is killing their access.

You can also put on your website that users are not allowed to use the 
site using non interactive browsers. Then when you find them you send a 
nasty gram to their ISP and notify them that continued abuse could be a 
crime under the Computer Fraud and Abuse Act (if you and they are in the 
US) and let their ISP take care of it.

--
Michael Conlen
Jack L. Stone wrote:

Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
The above is typical of the servers in use, and with csh shells employed,
plus IPFW.
My apologies for the length of this question, but the background seems
necessary as brief as I can make it so the question makes sense.
The problem:
We have several servers that provide online reading of Technical articles
and each have several hundred MB to a GB of content.
When we started providing the articles 6-7 years ago, folks used browsers
to read the articles. Now, the trend has become a more lazy approach and
there is an increasing use of those download utilities which can be left
unattended to download entire web sites taking several hours to do so.
Multiply this by a number of similar downloads and there goes the
bandwidth, denying those other normal online readers the speed needed for
loading and browsing in the manner intended. Several hundred will be
reading at a time and several 1000 daily.
Further, those download utilities do not discriminate on the files
downloaded unless the user sets them to exclude certain types of files they
don't need for the articles. All or most don't bother to set the
parameters. They just turn them loose and go about their day. Essentially a
DoS for normal readers who notice the slowdown, but not with malice.
This method downloads a tremendous amount of unnecessary content. Some
downloaders have been contacted to stop (if we spot an email address from a
login) and in response they simply weren't aware of the problems they were
making and agreed to at least spread downloads over longer periods of time.
I can live with that.
A possible solution?
Now, my question: Is it possible to write a script that can constantly scan
the Apache logs to look for certain footprints of those downloaders,
perhaps the names, like HTTRACK, being one I see a lot. Whenever I see
one of those sessions, I have been able to abort them by adding a rule to
the firewall to deny the IP address access to the server. This aborts the
downloading, but have seen the attempts constantly continue for a day or
two, confirming unattended downloads.
Thus, if the script could spot an offender and then perhaps make use of
the firewall to add a rule containing the offender's IP address and then
flush to reset the firewall, this would at least abort the download and
free up the bandwidth (I already have a script that restarts the firewall).
Is this possible and how would I go about it???

Many thanks for any ideas on this!

Best regards,
Jack L. Stone,
Administrator
SageOne Net
http://www.sage-one.net
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]
 

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Script help needed please

2003-08-14 Thread Jez Hancock
On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote:
 Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1
 The above is typical of the servers in use, and with csh shells employed,
 plus IPFW.
 
 My apologies for the length of this question, but the background seems
 necessary as brief as I can make it so the question makes sense.
 
 The problem:
 We have several servers that provide online reading of Technical articles
 and each have several hundred MB to a GB of content.
 
 When we started providing the articles 6-7 years ago, folks used browsers
 to read the articles. Now, the trend has become a more lazy approach and
 there is an increasing use of those download utilities which can be left
 unattended to download entire web sites taking several hours to do so.
 Multiply this by a number of similar downloads and there goes the
 bandwidth, denying those other normal online readers the speed needed for
 loading and browsing in the manner intended. Several hundred will be
 reading at a time and several 1000 daily.
snip
There is no easy solution to this, but one avenue might be to look at
bandwidth throttling in an apache module.

One that I've used before is mod_throttle which is in the ports:

/usr/ports/www/mod_throttle

which allows you to throttle users by ip address to a certain number of
documents and/or up to a certain transfer limit.  IIRC it's fairly
limited though in that you can only apply per IP limits to _every_
virtual host - ie in the global httpd.conf context.

A more finegrained solution (from what I've read, haven't tried it) is
mod_bwshare - this one isn't in the ports but can be found here:

http://www.topology.org/src/bwshare/

this module overcomes some of the shortfalls of mod_throttle and allows
you to specify finer granularity over who consumes how much bandwidth
over what time period.

 Now, my question: Is it possible to write a script that can constantly scan
 the Apache logs to look for certain footprints of those downloaders,
 perhaps the names, like HTTRACK, being one I see a lot. Whenever I see
 one of those sessions, I have been able to abort them by adding a rule to
 the firewall to deny the IP address access to the server. This aborts the
 downloading, but have seen the attempts constantly continue for a day or
 two, confirming unattended downloads.
 
 Thus, if the script could spot an offender and then perhaps make use of
 the firewall to add a rule containing the offender's IP address and then
 flush to reset the firewall, this would at least abort the download and
 free up the bandwidth (I already have a script that restarts the firewall).
 
 Is this possible and how would I go about it???
If you really wanted to go down this route then I found a script someone
wrote a while back to find 'rude robots' from a httpd logfile which you
could perhaps adapt to do dynamic filtering in conjunction with your
firewall:

http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html

If you have any success let me know.

-- 
Jez

http://www.munk.nu/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]