All,
On 5/26/22 15:49, Christopher Schultz wrote:
Rainer,
On 5/16/22 13:48, Christopher Schultz wrote:
Rainer,
I've been looking into this a little more in my production environment.
These errors are not super common, but there seems to be a steady
trickle of errors from my two services that have human users. I see 0
errors for my API-based services, which makes me think that I don't
have any performance issues... I probably have human users
disappearing for random reasons.
The errors in mod_jk.log look like this:
[Sun May 15 04:19:15.643 2022] [5859:139625025315392] [info]
ajp_process_callback::jk_ajp_common.c (2077): (myworker) Writing to
client aborted or client network problems
[Sun May 15 04:19:15.644 2022] [5859:139625025315392] [info]
ajp_service::jk_ajp_common.c (2773): (myworker) sending request to
tomcat failed (unrecoverable), because of client write error (attempt=1)
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info]
service::jk_lb_worker.c (1595): service failed, worker myworker is in
local error state
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info]
service::jk_lb_worker.c (1614): unrecoverable error 200, request
failed. Client failed in the middle of request, we can't recover to
another instance.
[Sun May 15 04:19:15.646 2022] [5859:139625025315392] [info]
jk_handler::mod_jk.c (2984): Aborting connection for worker=myworker
(Note that the last message "Aborting connection for worker=myworker"
may have a bug; my actual worker name has a name containing a hyphen
(-) and only the text before the hyphen is being emitted in that error
message.)
Anyway, when researching these errors, it would be helpful to me to
know which requests are failing. By looking at the access_log, I only
see a single request logged for 04:19:15 on that server so it's
probably the right one, but httpd says that the response code is 302
instead of e.g. 50x for an error.
When we log these kinds of errors, it would be great to know a few
things IMO:
1. What was the URL of the request
2. How long did the client wait for the response before we found we
couldn't write to the stream (or, conversely, the start-timestamp of
the request as well as the timestamp of the error, which I think is
already in the log itself)
I see the place in the code where the error is generated, but I'm not
familiar enough with the code to know how to add that kind of thing.
The function in question (ajp_process_callback) has a pointer to a
jk_ws_service_t structure. Is it as simple as also logging like this?
/* convert start-time to a string */
char[32] timestamp;
apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S",
r->r->request_time);
/* emit a detailed log message */
jk_log(l, JK_LOG_INFO,
"(%s) Request to (%s %s) started at %s,%ld",
ae->worker->name, r->method, r->req_uri, timestamp,
r->r->request_time.tm_usec);
Does anyone think this might be generally useful?
It looks like I needed a few more things in order to get this to work:
{
char *timestamp = malloc(32);
apr_time_exp_t *timerec = malloc(sizeof(apr_time_exp_t));
apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
apr_time_exp_gmt(timerec, ap->r->request_time);
apr_strftime(timestamp, NULL, 32, "%Y-%m-%d %H:%M:%S", timerec);
/* emit a detailed log message */
jk_log(l, JK_LOG_INFO,
"(%s) Request to (%s %s) started at %s",
ae->worker->name, r->method, r->req_uri, timestamp);
free(timerec);
free(timestamp);
}
The compiler wouldn't let me use an automatically-allocated char[32] so
I had to malloc it. I also had to convert from long to apr_time_exp_t
which required another structure/malloc as well.
Finally, I had to cast the r->ws_private to apache_private_data_t* which
required me to copy/paste the definition of apache_private_data_t from
mod_jk.h into jk_ajp_common.c -- just as a test for now.
Obviously, this works only for Apache httpd.
I have it running on one of my web servers for now -- hoping to catch an
error and see this additional logging to see if it's helpful to me. (LOL
just segfaulted; time to go back and look.)
What would be the best way to get the "start time" of the request in a
platform-independent way? Maybe just expand the jk_ws_service structure
to include it?
Here is my latest code which hasn't segfaulted in a while:
{
/* emit a detailed log message */
char timestamp[32]; // For formatted timestamp string
apr_time_exp_t timerec; // Required for apr_strftime
apr_size_t retsize; // Required for apr_strftime
apache_private_data_t *ap = (apache_private_data_t*)r->ws_private;
apr_time_exp_gmt(&timerec, ap->r->request_time); // Convert
request_time to apr_time_exp_t
apr_strftime(timestamp, &retsize, 32, "%Y-%m-%d %H:%M:%S", &timerec);
// Format the timestamp
if(0 == retsize) {
strcpy(timestamp, "(overflow)");
}
jk_log(l, JK_LOG_INFO,
"(%s) Request to (%s %s%s%s) started at %s,%ld",
ae->worker->name, r->method, r->req_uri,
r->query_string ? "?" : "",
r->query_string ? r->query_string : "",
timestamp, timerec.tm_usec);
}
There are no more calls to malloc in there, so the compiler is handling
the memory for all the locals I'm inventing in this code block.
Back to my original investigation... the small number or errors thus far
seem to indicate pathological behavior on the user's part rather than
any fault in my own application. In most cases, the (same?) user
launches a bunch of requests to the same URI and most of them fail. I
suspect it's a person multi-clicking a link to a specific resource.
Based upon the URI and our user patterns, it's not likely mobile clients.
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org