Re: Off-loading heavy process

Rob Sargent Fri, 11 Dec 2020 12:01:05 -0800

Chris,
Thank you for the completeness.  I always miss on getting the correct detail 
(too little, too much) in my postings.


> On Dec 11, 2020, at 11:31 AM, Christopher Schultz 
> <ch...@christopherschultz.net> wrote:
> 
> Rob,
> 
> On 12/9/20 23:58, Rob Sargent wrote:
>> My apologies if this is too vague to warrant consideration.
> 
> It is vague, but we can always ask questions :)
> 
>> In the recent past I managed a naked port, a Selector, a ThreadPoolExecutor 
>> and friends (and it worked well enough...) but a dear and knowledgeable 
>> friend suggested embedding tomcat and using http.[3]
> 
> Managing that yourself can be a pain. The downside to using Tomcat is that 
> you have to use HTTP. But maybe that can work in your favor in certain cases, 
> especially if you have other options (e.g. upgrade from HTTP to Websocket 
> after connection.)
> 
> > I have that working, one request at a time.
> 
> Great.
> 
>> Is it advisable, practical to (re)establish a ThreadPoolExecutor, queue etc 
>> as a tomcat accessible "Resource" with JDNI lookup, and have my servlet pass 
>> the work off to the Executor's queue?
> 
> I don't really understand this at all. Are you asking how to mitigate a 
> self-inflected DOS because you have so many incoming connections?
> 
> If you have enough hardware to satisfy the requests, and your usage pattern 
> is as you suggest, then you will mostly have one or two huge requests 
> in-flight at any time, and some large number of smaller (faster?) requests 
> also in-flight at the same time.
> 
> Again, no problem. You are constrained only by the resources you have 
> available:
> 
> 1. Memory
> 2. Maximum connections
> 3. Maximum threads (in your executor / request-processor thread pool)
> 

For the large request, the middle-ware (my impl, or tomcat layer) takes the 
payload from the client an writes “lots” of records in the database.  Do I want 
that save() call in the servlet or should I queue it up for some other handler. 
All on the same hardware, but that frees up the servlet.  In the small client 
(my self-made DOS), there’s only a handful of writes, but still faster to hand 
that memory to a queue and let the servlet go back to the storm.

That’s the thinking behind the question of accessing a ThreadPoolExecutor via 
JDNI.  I know my existing impl does queue jobs so (so the load is greater than 
the capacity to handle requests).  I worry that without off-loading Tomcat 
would just spin up more servlet threads, exhaust resources.  I can lose a 
client, but would rather not lose the server (that looses all clients...)


> If you have data that doesn't fix into byte[MAXINT] then maybe you don't want 
> to try to handle it all at once. That's an application design decision and if 
> gzipping helps you in the short term, then great. My recommendation would be 
> to look at ways of handling that request in a streaming-fashion instead of 
> buffering everything up in memory. The overall performance of your 
> application will likely improve because of that change.

Re-working the structure of the payload (break it up) is an option, but not a 
pleasant one :)

> 
>> [2] Given infinite EC2 capacity there would be tens of thousands of jobs 
>> started at once.  Realistic AWS capacity constraints limit this to hundreds 
>> of instances from a queue of thousands.  The duration of any instance varies 
>> from hours to days.  But the payload is simple, under 5K bytes.
> If you are using AWS, then you can load-balance between any number of 
> back-end Tomcat instances. The lb just has to decide which back-end instance 
> to use. Sometimes lbs make bad decisions and you get all the load on a single 
> node. That's bad because (a) one node is overloaded and (b) the other nodes 
> are under-utilized. It's up to you to figure out how to get your load 
> distributed in an equitable way.
> 
Not anxious to add more Tomcat instances.  I can manually throttle both types 
of requests for now.

> Back to the problem you are actually trying to solve.
> 
> Are these "small requests" in any way a problem? Do they arrive frequently 
> and are they handled quickly? If so, then you can probably mostly just ignore 
> them. It's the huge requests that are (likely) the problem.
> 


> If you want to hand-off control of a request to another thread for 
> processing, there are a few ways to do that I can think of:
> 
> 1. new Thread(new Runnable() { /* your stuff */ }).start();
> 
> This is bad for a few of reasons:
> 
> 1a. Unlimited threads created by remote clients? Bad.
> 
> 1b. Call to Thread.start() returns immediately and the servlet's execution 
> ends. There is no way to reply to the client's, and if you haven't read all 
> their input, Bad Things will happen in Tomcat. (Like, your request and 
> response objects will be re-used and you'll observe mass-chaos).
> 
> 2. sharedExecutor.submit(new Runnable() { /* your stuff */ });
> 
> This is bad for the same reason as 1b above, but it does not suffer from 1a. 
> 1a is now replaced by:
> 
> 2a. Unlimited jobs submitted by remote clients? Bad.
> 
> 3. Use servlet async processing.
> 
> I think this is probably ideal for your use-case. When you go into 
> asynchronous mode, the request-processing thread is allowed to go back and 
> service other requests, so you get a more responsive server, at least from 
> your clients' perspectives.
> 
> The bad news is that asynchronous mode requires that you completely change 
> the way you think about communicating with a client. Instead of reading the 
> input until you are satisfied, performing your business-logic, then writing 
> to the client until you are done, you have to subscribe to I/O events and 
> handle them all appropriately. If you get it wrong, you can make a mess of 
> things.
> 
Here’s where my ignorance will really shine.  Are you talking about 
HttpClient.sendAsync or are you talking about a Tomcat mode of operation?
The clients die as soon as they send the request.  I wait for an ok, but don’t 
really have to.  (Would be nice if I could take statusCode() != 200 and write 
to a file (as I used to) but I can lose a client or two, occasionally.  Release 
2.1 :) )
There’s very little two-way communication:  AWS queue starts clients. Clients 
ask for data to work on, do a bunch of simulations, send results to middleware; 
middle-ware make db calls.
Client does not know about db.


> It would help to understand the nature of what has to happen with the data 
> once it's received by your servlet. Does the processing take a long time, or 
> is it mostly the data-upload that takes forever? Is there a way for the 
> client to cut-up the work in any way? Are there other ways for the client to 
> get a response? Making an HTTP connection and then waiting 18 hours for a 
> response is ... not how HTTP is supposed to work.
> 
> Long ago I worked on a product where we had to perform some long operation on 
> the server and client browsers would time-out. (This was a web browser, so 
> they have timeouts of like 60 seconds or so. They aren't custom clients where 
> you say just say "wait 18 hours for a response.").

Your “Job” example seems along the lines of get-it-off-the-servlet, which again 
points back to my current queue handler I think.

> This allows the server to make progress even if the client times out, or 
> loses a network connection, or power, or whatever. It also allows the server 
> to use that network connection used to submit the initial job for accepting 
> other connections. And you don't have to use servlet async and re-write your 
> whole process.
> 
> I don't know if any of this helps, but I thik it will get you thinking in a 
> more HTTP-style way rather than a connection-oriented service like the one 
> you had originally implemented.
Helps a ton.  Very thankful for the indulgence.
I hope I’ve given useful responses.

Next up, is SSL.  One of the reason’s I must switch from my naked socket impl.


rjs


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Off-loading heavy process

Reply via email to