-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Chris,

On 12/2/14 4:28 PM, Chris Gamache wrote:
> Anyone ever suggested? No idea. But I'd be glad to riff on the
> subject in case it shakes some discussion loose.
> 
> I haven't seen a filter like that, but I'm sure you could work it
> out. If I were implementing it I would use a Valve... Valves are
> easy to write- just extend org.apache.catalina.valves.ValveBase and
> wire it into your xml configurations. You could probably be more
> sophisticated in your throttling, letting certain IPs or requests
> through while tarpitting others.

I think a Valve is more appropriate than a Filter, if only because it
can be installed "earlier" in the pipeline.

> A number of considerations; these off the top of my head--
> 
> You'd have to be okay with holding the request thread open and
> making it sleep.

If this weren't done in a Valve, but deeper in the core, Tomcat could
even put the request on hold and free the thread to do other things.
This happens with the NIO connectors when the request is still being
sent by the client: the thread isn't tied-up waiting on IO.

> I guess you could serve a 503 if it were overloaded.

I guess the question is whether this is a "throttle" intended to just
smooth-out data-mass|request-counts for a particular client[1] or to
avoid being overwhelmed by requests. 503 is better for avoiding an
overload, but can't work well for more traditional "throttling".

> Consider your memory usage. I've read horror stories about GC
> pauses wreaking havoc. If you have a farm of tomcats that would be
> participating you'd need to work out a way for them to communicate
> with one another for global counter stats.

+1

[1] Identifying clients is always problematic. Proxies (AOL) and other
things can make it look like a huge number of distinct users are
coming from a single IP, effectively treating them as a single user
for the purposes of throttling (if you use IP-based client
identification). This is like 16-bit Microsoft Windows programs
sharing a single time-slice per unit time while 32-bit applications
get a full slice for each application: bad for business (if business
is getting attention from the server!).

There are other techniques to identify clients, but they all have
their own problems:

1. Cookies. Client disables cookies. Chaos ensues.
2. TLS session id. Client re-connects. New session = new stats.
3. IP address. Proxies are a problem.
4. Authenticated user. This is likely the most effective and safest
way, but it does require that users have a trackable-session. Anyone
wishing to subvert your throttling can do so by maintaining multiple
logins, etc. Basically, this only works for users who behave themselves.

I can't think of anything else off the top of my head.

Another problem to which Chris alludes above is that of memory usage:
if you want to track clients, you'll need to know who they are. IF you
expect to handle requests from 100M clients, you'll need to make sure
that whatever data structure you use to track them doesn't grow
enormous. If you wanted to create a Java class to track clients, you
might do this:

class Client {
  String ipAddress;
  long firstVisitTimestamp;
  long totalRequestCount;
}

Each instance of that class takes 8 bytes for the class itself, 8
bytes for each of the long values, 4 bytes for the object reference to
the ipAddress String, and then that String (assuming a long IPv4
address) will take up 8 bytes + 4 length bytes + 15 characters which
is either 15*1 (UTF-8, which I thikn was never used) or 15*2 (UTF-16,
which Java used to use internally) or 15*4 (full Unicode, which I
think is used in modern JVMs) for a grand total of 80 bytes for each
client entry.

With 100M clients to track, you'll need 100M * 80 = 800,000,000 bytes
which is roughly 3/4 GiB /just for client tracking/. If you hold long
IPv6 addresses, it gets worse.

You can (and should) always cap the total number of clients that you
track, but this leads to another problem where clients can game the
system by flooding you with requests for "other" clients to turn-over
your client-tracking system to reset their own stats.

There's also the problem of keeping all nodes in a cluster up-to-date
with the latest stats for a particular client. Depending upon your
client-identification scheme, this can be easy (session -- stats are
in the session; problem solved) or difficult (IP address must be
broadcast to all nodes to keep stats up-to-date).

It's a non-trivial problem.

- -chris

> On Tue, Dec 2, 2014 at 12:28 PM, Leo Donahue <donahu...@gmail.com>
> wrote:
> 
>> Has anyone ever suggested a configurable throttle filter as one
>> of the container provided filters in Tomcat?
>> 
>> Or are people generally using the attributes in the HTPP
>> connector for limiting requests to the server for a given amount
>> of time?
>> 
>> leo
>> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCAAGBQJUfjhzAAoJEBzwKT+lPKRY3L8QAKgeSzNGwJVlgHMAlbkxonzz
Q/QstPbcdjDTa4f7nUfr0JTd2lfIV/eT7p7VgZHQio4SE0VuMZJSSvHRbswlqKFm
55RvXdqg6xlzAvZ0O9uzslNC53yn//xIxrVqpuE6HTUr66bc9fTQioPIdzNGaRGv
Mk52fnYm9uT5mwy24BL5aWwHcHMDrSofezH/oHWKvQoY/vabu5IVeIwD760vPRq9
lQ/MUjbv81WFDOVPQg9st3DsrZ8oXeVyW00tX19+xmGARpBuQ4RibSEK+Ugswefx
8EgLSCEh8QC3BvUejVuM2qmWwfstUJl/bkc+6hj/5zcYujXFi9ZuAbKlfw20nOvg
AjOWTNlfuijSycjzDrv5JSOP29PNmSSb/hyqTWIiZii2+HhxOH8KN0DK5KIvOP/z
xTyvZLhhyEEIO4OVlKXWPxs0Mh5oV1ddJaiSaCSgFwjd5/25jjEj+rRQoT53qW0+
ggsn/5bl/exwRkjboQfLDut1Y787DV+6bt2gFdR15EXT4+osw16CTpskECGkOmyB
SK0TLO0h/AUmIeoRAk0m4VHJh24zTsiR3vrlziAGa8w37CIhehuYMzJQmK2lZIFO
jTiBNmqln24aTCW0OTt04Lk4PUYXe5MueOPdq7JQhmMARdh4b+AuT05X8CJVgVio
7CyAtRNsh0NdnRZYLHD6
=DPsG
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to