Hi all,
we are currently observing a really bizarre problem on a customer system.
Our software runs a number of microservices on individual Tomcats, which we
front with an Apache HTTPD (2.4.x) reverse proxy using mod_jk to route the
requests by context. There is one exception, though: one of the microservices
which we added to the stack at a later point in time uses websocksets, which
are not supported through the AJP protocol, so we are using mod_proxy_balancer
here.
We put the ProxyPass etc. rules for mod_proxy_balancer in front of the
directives related to mod_jk and we have been mostly fine with this approach
for a few years now. We have two sets of balancer specifications for
mod_proxy_balancer and their associated rules, one for regular http traffic,
the other for websocket traffic ("ws:" resp. "wss:").
Let's name the microservices that are handled by mod_jk A, B, and C, and let's
name the one handled by mod_proxy_balancer Z. Let's further assume that their
request contexts are /a, /b, /c and /z, respectively.
Now about the current customer problem: the customer started experiencing very
erratic system behaviour. In particular requests that were meant for one of the
microservices A-C handled by mod_jk would randomly give 404 responses. Usually,
this situation would persist for an affected user for a few seconds and
reloading wouldn't resolve it. At the same time, other users accessing the very
same microservice didn't have a problem. Pretty much all users were affected
from time to time.
We did several troubleshooting sessions that turned up nothing. At some point,
we started to monitor all kinds of traffic between HTTPD and the Tomcats with
TCPdump, and here we found the bizarre thing:
When we ran TCP dump and filtered it to only show traffic between HTTPD and the
microservice Z (handled by mod_proxy_balancer), we sometimes saw requests that
were clearly meant for one of the OTHER microservices (A-C) based on the
request URL (a, /b, /c) that would show up in the traffic to the microservice
Z, and naturally microservice Z has no idea of what to do with these requests
and responds with 404.
What else might be relevant:
- our microservices are stateless, so we an scale horizontally if we want. On
that particular system, we have at least two instances of each microservice
(A-C and Z)
- the installation is spread across multiple nodes
- the nodes run on Linux
- Docker is not used ;-)
- we have never seen this problem on any other system
- we haven't seen this problem on the customer's test system, but here usage
patterns are different
- the requests with 404 responses wouldn't show up in the HTTPD's access log
(where "normal" 404 requests DO show).
- the customer had recently updated from a version of our product that uses
Apache 2.4.34 to one using 2.4.41
- disabling the microservice Z (= no more balancer workers for
mod_proxy_balancer) would resolve the problem
- putting the rules for mod_proxy_balancer after those of mod_jk (and adding an
exclusion for /z there, cause on of the other microservices is actually
listening on the root context) would NOT change a thing
From experience, we are pretty sure that the problem is somewhere on our side.
;-)
- One thing we thought is that maybe a bug in microservice Z that is only
triggered by this customer's use of our product causes the erratic behaviour of
the HTTPD/MPB? Maybe something we do wrong messing up the connection keepalive
between Apache and Tomcat, causing requests to go the wrong way?
- Or maybe it is related to the Apache version update (2.4.34 to 2.4.41)? But
why are other installations with the same version not affected?
Any ideas where we should start looking?
Regards
J
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]