Re: Off-loading heavy process

Christopher Schultz Mon, 14 Dec 2020 07:47:45 -0800

Rob,

On 12/11/20 18:52, Rob Sargent wrote:

Chris,
This is _so_ helpful.

On 12/11/20 3:00 PM, Christopher Schultz wrote:
Rob,

On 12/11/20 15:00, Rob Sargent wrote:
> [huge snip]
Your “Job” example seems along the lines of get-it-off-the-servlet,
which again points back to my current queue handler I think.
Yes, I think so. So let's get back to your original idea -- which Ithink is a good one -- to use a shared queue to manage the jobs.
Just to be clear, the servlet is going to reply to the client ASAP bysaying "I have accepted this job and will do my best to complete it",or it will return an error (see below), or it will refuse a connection(see below). Sound okay so far?
[My servlet] takes the payload from the client an writes “lots” of
records in the database.  Do I want that save() call in the servlet
or should I queue it up for some other handler. All on the same
hardware, but that frees up the servlet.
If the client doesn't care about job status information, thenfire-and-forget clients is a reasonable methodology. You may find thatat some point, they will want to get some job-status information. Youcould implement that, later. Version 2.2 maybe?
Yeah, my clients are only visible through the AWS console currently. Any"progress/dashboard" won't show up 'til version 2.345
On the other hand, if you can process some of the request in astreaming way, then you can be writing to your database before yourclient is done sending the request payload. You can still do that withfire-and-forget, but it requires some more careful handling of thestreams and stuff like that.
The one thing you cannot do is retain a reference to the request(response, etc.) after your servlet's service() method ends. Well,unless you go async but that's a whole different thing which doesn'tsound like what you want to do, now that I have more info.
Calling save() from the servlet would tie-up the request-processingthread until the save completes. That's where you get your 18-hourresponse times, which is not very HTTP-friendly.
Certainly don't want to pay for 18 EC2 hours of idle.

So your clients spin-up an EC2 instance just to send the request to yourserver? That sounds odd.

Avoiding calling save() from the servlet requires that you fully-readthe request payload before queuing the save() call into a thread poolbundled-up with your data. (Well, there are some tricks you could usebut they are a little dirty and may not buy you much.)
In the small client (my self-made DOS), there’s only a handful of
writes, but still faster to hand that memory to a queue and let the
servlet go back to the storm.
I would make everything work the same way unless there is a compellingreason to have different code paths.
The two payloads are impls of an a base class. Jackson/ObjectMapperunravels them to Type. Type.save();

Okay, so it looks like Type.save() is what needs to be called in theseparate thread (well, submitted to a job scheduler; just get it off therequest processing thread so you can return 200 response to the client).

That’s the thinking behind the question of accessing aThreadPoolExecutor via JDNI. I know my existing impl does queue jobsso (so the load is greater than the capacity to handle requests). Iworry that without off-loading Tomcat would just spin up more servletthreads, exhaust resources. I can lose a client, but would rathernot lose the server (that looses all clients...)
Agreed: rejecting a single request is preferred over the servicecoming down -- and all its in-flight jobs with it.
So I think you want something like this:

servlet {
  post {
    // Buffer all our input data
    long bufferSize = request.getContentLengthLong();
    if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
      bufferSize = 8192; // Reasonable default?
    }
ByteArrayOutputStream buffer = newByteArrayOutputStream((int)bufferSize);
    int count;
    byte[] buffer = new byte[8192];
    while(-1 != (count = in.read(buf)) {
        buffer.write(buf, 0, count);
    }

    // All data read: tell the client we are good to go
    Job job = new Job(buffer);
    try {
      sharedExecutor.submit(job); // Fire and forget

      response.setStatus(200); // Ok
    } catch (RejectedExecutionException ree) {
      response.setStatus(503); // Service Unavailable
    }
  }
}

This is working:

       protected void doPost(HttpServletRequest req, HttpServletResponse
    resp) /*throws ServletException, IOException*/ {
         lookupHostAndPort();

         Connection conn = null;
         try {

           ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
    JavaTimeModule()).build();
           jsonMapper.setSerializationInclusion(Include.NON_NULL);

           try {

             AbstractPayload payload =
    jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
             logger.error("received payload");
             String redoUrl =
    String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
    getDbPort(), getDbName(req));
            Connection copyConn = DriverManager.getConnection(redoUrl,
    getDbRole(req), getDbRole(req)+getExtension());


So it's here you cannot pool the connections? What about:

    Context ctx = new InitialContext();

DataSource ds = (DataSource)ctx.lookup("java:/comp/env/jdbc/" +getJNDIName(req));

Then you can define your per-user connection pools in JNDI and get thebenefit of connection-pooling.

             payload.setConnection(copyConn);
             payload.write();


Is the above call the one that takes hours?

             //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
    can be a lot of data, COPY is blindingly fast

Maybe the payload.write() is not slow. Maybe? After this you don't doanything else...

             resp.setContentType("plain/text");
             resp.setStatus(200);
             resp.getOutputStream().write("SGS_OK".getBytes());
             resp.getOutputStream().flush();
             resp.getOutputStream().close();
           }
             //Client can do squat at this point.
           catch
    (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
             logger.error("transform failed: " + mie.getMessage());
             resp.setContentType("plain/text");
             resp.setStatus(461);
             String emsg = "PAYLOAD NOT
    SAVED\n%s\n".format(mie.getMessage());
             resp.getOutputStream().write(emsg.getBytes());
             resp.getOutputStream().flush();
             resp.getOutputStream().close();
           }
         }
         catch (IOException | SQLException ioe) {
         etc }
Obviously, the job needs to know how to execute itself (making itRunnable means you can use the various Executors Java provides). Also,you need to decide what to do about creating the executor.
I used the ByteArrayOutputStream above to avoid the complexity ofre-scaling buffers in example code. If you have huge buffers and youneed to convert to byte[] at the end, then you are going to need 2xheap space to do it. Yuck. Consider implementing the auto-re-sizingbyte-array yourself and avoiding ByteArrayOutputStream.
There isn't anything magic about JNDI. You could also put the threadpool directly into your servvlet:
servlet {
  ThreadPoolExecutor sharedExecutor;
  constructor() {
    sharedExecutor = new ThreadPoolExecutor(...);
  }
  ...
}
Yes, I see now that the single real instance of the servlet can masterthe sharedExcutor.
I have reliable threadpool code at hand. I don't need to separate thejob types: In practice all the big ones are done first: they define thesmall ones. It's when I'm spectacularly successful and two (2)investigators want to use the system ...


Sounds good.

But I am still confused as to what is taking 18 hours. None of the callsabove look like they should take a long time, given your comments.

If you want to put those executors into JNDI, you are welcome to doso, but there is no particular reason to. If it's convenient toconfigure a thread pool executor via some JNDI injectionsomething-or-other, feel free to use that.
But ultimately, you are just going to get a reference to the executorand drop the job on it.
Next up, is SSL. One of the reason’s I must switch from my nakedsocket impl.
Nah, you can do TLS on a naked socket. But I think using Tomcatembedded (or not) will save you the trouble of having to learn a wholelot and write a lot of code.
No thanks.
TLS should be fairly easy to get going in Tomcat as long as youalready understand how to create a key+certificate.
I've made keys/certs in previous lives (not to say I understand them).I'm waiting to hear on whether or not I'll be able to self-sign etc.Talking to AWS Monday on the things security/HIPAA

AWS may tell you that simply using TLS at the load-balancer (which isfall-off-a-log easy; they will even auto-renew with an AWS-signed CA),which should be sufficient for your needs. You may not have to configureTomcat for TLS at all.

I'm sure I'll be back, but I think I can move forward.  Much appreciated.


Any time.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Off-loading heavy process

Reply via email to