> On Aug 2, 2017, at 8:33 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > IMHO, intentionally causing connections to fail when a limit is exceeded > would not be a very good idea. When the rate gets too high, the first > thing that happens is all the requests slow down. The slowdown could be > dramatic. As the rate continues to increase, some of the requests > probably would begin to fail.
No, this is a very good idea. It is called “load shedding” or “fail fast”. Gracefully dealing with overload is an essential part of system design. At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took down 40 front end servers with slow response times from the Solr server farm. We tied up all the front end threads waiting on responses from the Solr servers. That left no front end threads available to respond to incoming HTTP requests. It was not a fun evening. To fix this, we configured the Citrix load balancer to overflow to a different server when the outstanding back-end requests hit a limit. The overflow server was a virtual server that immediately returned a 503. That would free up front end connections and threads in an overload condition. The users would get a “search unavailable” page, but the rest of the site would continue to work. Unfortunately, the AWS load balancers don’t offer anything like this, ten years later. The worst case version of this is a stable congested state. It is pretty easy to put requests into a queue (connection/server) that are guaranteed to time out before they are serviced. If you have 35 requests in the queue, a 1 second service time, and a 30 second timeout, those requests are already dead when you put them on the queue. I learned about this when I worked with John Nagle at Ford Aerospace. I recommend his note “On Packet Switches with Infinite Storage” (1985) for the full story. It is only eight pages long, but packed with goodness. https://tools.ietf.org/html/rfc970 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)