-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Chris,
On 12/2/14 4:28 PM, Chris Gamache wrote: > Anyone ever suggested? No idea. But I'd be glad to riff on the > subject in case it shakes some discussion loose. > > I haven't seen a filter like that, but I'm sure you could work it > out. If I were implementing it I would use a Valve... Valves are > easy to write- just extend org.apache.catalina.valves.ValveBase and > wire it into your xml configurations. You could probably be more > sophisticated in your throttling, letting certain IPs or requests > through while tarpitting others. I think a Valve is more appropriate than a Filter, if only because it can be installed "earlier" in the pipeline. > A number of considerations; these off the top of my head-- > > You'd have to be okay with holding the request thread open and > making it sleep. If this weren't done in a Valve, but deeper in the core, Tomcat could even put the request on hold and free the thread to do other things. This happens with the NIO connectors when the request is still being sent by the client: the thread isn't tied-up waiting on IO. > I guess you could serve a 503 if it were overloaded. I guess the question is whether this is a "throttle" intended to just smooth-out data-mass|request-counts for a particular client[1] or to avoid being overwhelmed by requests. 503 is better for avoiding an overload, but can't work well for more traditional "throttling". > Consider your memory usage. I've read horror stories about GC > pauses wreaking havoc. If you have a farm of tomcats that would be > participating you'd need to work out a way for them to communicate > with one another for global counter stats. +1 [1] Identifying clients is always problematic. Proxies (AOL) and other things can make it look like a huge number of distinct users are coming from a single IP, effectively treating them as a single user for the purposes of throttling (if you use IP-based client identification). This is like 16-bit Microsoft Windows programs sharing a single time-slice per unit time while 32-bit applications get a full slice for each application: bad for business (if business is getting attention from the server!). There are other techniques to identify clients, but they all have their own problems: 1. Cookies. Client disables cookies. Chaos ensues. 2. TLS session id. Client re-connects. New session = new stats. 3. IP address. Proxies are a problem. 4. Authenticated user. This is likely the most effective and safest way, but it does require that users have a trackable-session. Anyone wishing to subvert your throttling can do so by maintaining multiple logins, etc. Basically, this only works for users who behave themselves. I can't think of anything else off the top of my head. Another problem to which Chris alludes above is that of memory usage: if you want to track clients, you'll need to know who they are. IF you expect to handle requests from 100M clients, you'll need to make sure that whatever data structure you use to track them doesn't grow enormous. If you wanted to create a Java class to track clients, you might do this: class Client { String ipAddress; long firstVisitTimestamp; long totalRequestCount; } Each instance of that class takes 8 bytes for the class itself, 8 bytes for each of the long values, 4 bytes for the object reference to the ipAddress String, and then that String (assuming a long IPv4 address) will take up 8 bytes + 4 length bytes + 15 characters which is either 15*1 (UTF-8, which I thikn was never used) or 15*2 (UTF-16, which Java used to use internally) or 15*4 (full Unicode, which I think is used in modern JVMs) for a grand total of 80 bytes for each client entry. With 100M clients to track, you'll need 100M * 80 = 800,000,000 bytes which is roughly 3/4 GiB /just for client tracking/. If you hold long IPv6 addresses, it gets worse. You can (and should) always cap the total number of clients that you track, but this leads to another problem where clients can game the system by flooding you with requests for "other" clients to turn-over your client-tracking system to reset their own stats. There's also the problem of keeping all nodes in a cluster up-to-date with the latest stats for a particular client. Depending upon your client-identification scheme, this can be easy (session -- stats are in the session; problem solved) or difficult (IP address must be broadcast to all nodes to keep stats up-to-date). It's a non-trivial problem. - -chris > On Tue, Dec 2, 2014 at 12:28 PM, Leo Donahue <donahu...@gmail.com> > wrote: > >> Has anyone ever suggested a configurable throttle filter as one >> of the container provided filters in Tomcat? >> >> Or are people generally using the attributes in the HTPP >> connector for limiting requests to the server for a given amount >> of time? >> >> leo >> > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQIcBAEBCAAGBQJUfjhzAAoJEBzwKT+lPKRY3L8QAKgeSzNGwJVlgHMAlbkxonzz Q/QstPbcdjDTa4f7nUfr0JTd2lfIV/eT7p7VgZHQio4SE0VuMZJSSvHRbswlqKFm 55RvXdqg6xlzAvZ0O9uzslNC53yn//xIxrVqpuE6HTUr66bc9fTQioPIdzNGaRGv Mk52fnYm9uT5mwy24BL5aWwHcHMDrSofezH/oHWKvQoY/vabu5IVeIwD760vPRq9 lQ/MUjbv81WFDOVPQg9st3DsrZ8oXeVyW00tX19+xmGARpBuQ4RibSEK+Ugswefx 8EgLSCEh8QC3BvUejVuM2qmWwfstUJl/bkc+6hj/5zcYujXFi9ZuAbKlfw20nOvg AjOWTNlfuijSycjzDrv5JSOP29PNmSSb/hyqTWIiZii2+HhxOH8KN0DK5KIvOP/z xTyvZLhhyEEIO4OVlKXWPxs0Mh5oV1ddJaiSaCSgFwjd5/25jjEj+rRQoT53qW0+ ggsn/5bl/exwRkjboQfLDut1Y787DV+6bt2gFdR15EXT4+osw16CTpskECGkOmyB SK0TLO0h/AUmIeoRAk0m4VHJh24zTsiR3vrlziAGa8w37CIhehuYMzJQmK2lZIFO jTiBNmqln24aTCW0OTt04Lk4PUYXe5MueOPdq7JQhmMARdh4b+AuT05X8CJVgVio 7CyAtRNsh0NdnRZYLHD6 =DPsG -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org