All,

On 5/26/22 15:49, Christopher Schultz wrote:
Rainer,

On 5/16/22 13:48, Christopher Schultz wrote:
Rainer,

I've been looking into this a little more in my production environment.

These errors are not super common, but there seems to be a steady trickle of errors from my two services that have human users. I see 0 errors for my API-based services, which makes me think that I don't have any performance issues... I probably have human users disappearing for random reasons.

The errors in mod_jk.log look like this:

[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info] ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to client aborted or client network problems [Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info] ajp_service::jk_ajp_common.c (2773): (myworker) sending request to tomcat failed (unrecoverable), because of client write error (attempt=1) [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] service::jk_lb_worker.c (1595): service failed, worker myworker is in local error state [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] service::jk_lb_worker.c (1614): unrecoverable error 200, request failed. Client failed in the middle of request, we can't recover to another instance. [Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info] jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker

(Note that the last message "Aborting connection for worker=myworker" may have a bug; my actual worker name has a name containing a hyphen (-) and only the text before the hyphen is being emitted in that error message.)

Anyway, when researching these errors, it would be helpful to me to know which requests are failing. By looking at the access_log, I only see a single request logged for 04:19:15 on that server so it's probably the right one, but httpd says that the response code is 302 instead of e.g. 50x for an error.

When we log these kinds of errors, it would be great to know a few things IMO:

1. What was the URL of the request
2. How long did the client wait for the response before we found we couldn't write to the stream (or, conversely, the start-timestamp of the request as well as the timestamp of the error, which I think is already in the log itself)

I see the place in the code where the error is generated, but I'm not familiar enough with the code to know how to add that kind of thing. The function in question (ajp_process_callback) has a pointer to a jk_ws_service_t structure. Is it as simple as also logging like this?

   /* convert start-time to a string */
   char[32] timestamp;
   apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", r->r->request_time);
   /* emit a detailed log message */
   jk_log(l, JK_LOG_INFO,
          "(%s) Request to (%s %s) started at %s,%ld",
          ae->worker->name, r->method, r->req_uri, timestamp, r->r->request_time.tm_usec);

Does anyone think this might be generally useful?

It looks like I needed a few more things in order to get this to work:

   {
     char *timestamp = malloc(32);
     apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
     apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
     apr_time_exp_gmt(timerec, ap->r->request_time);
     apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
     /* emit a detailed log message */
     jk_log(l, JK_LOG_INFO,
            "(%s) Request to (%s %s) started at %s",
            ae->worker->name, r->method, r->req_uri, timestamp);
     free(timerec);
     free(timestamp);
   }

The compiler wouldn't let me use an automatically-allocated char[32] so I had to malloc it. I also had to convert from long to apr_time_exp_t which required another structure/malloc as well.

Finally, I had to cast the r->ws_private to apache_private_data_t* which required me to copy/paste the definition of apache_private_data_t from mod_jk.h into jk_ajp_common.c -- just as a test for now.

Obviously, this works only for Apache httpd.

I have it running on one of my web servers for now -- hoping to catch an error and see this additional logging to see if it's helpful to me. (LOL just segfaulted; time to go back and look.)

What would be the best way to get the "start time" of the request in a platform-independent way? Maybe just expand the jk_ws_service structure to include it?

Here is my latest code which hasn't segfaulted in a while:

{
  /* emit a detailed log message */
  char timestamp[32];     // For formatted timestamp string
  apr_time_exp_t timerec; // Required for apr_strftime
  apr_size_t retsize;     // Required for apr_strftime
  apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;

apr_time_exp_gmt(&timerec, ap->r->request_time); // Convert request_time to apr_time_exp_t apr_strftime(timestamp, &retsize, 32, "%Y-%m-%d %H:%M:%S", &timerec); // Format the timestamp
  if(0 == retsize) {
    strcpy(timestamp, "(overflow)");
  }

  jk_log(l, JK_LOG_INFO,
         "(%s) Request to (%s %s%s%s) started at %s,%ld",
         ae->worker->name, r->method, r->req_uri,
         r->query_string ? "?" : "",
         r->query_string ? r->query_string : "",
         timestamp, timerec.tm_usec);
}

There are no more calls to malloc in there, so the compiler is handling the memory for all the locals I'm inventing in this code block.

Back to my original investigation... the small number or errors thus far seem to indicate pathological behavior on the user's part rather than any fault in my own application. In most cases, the (same?) user launches a bunch of requests to the same URI and most of them fail. I suspect it's a person multi-clicking a link to a specific resource. Based upon the URI and our user patterns, it's not likely mobile clients.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to