-----Ursprüngliche Nachricht-----
Von: Lorenzo Dalla Vecchia <lorenzo.dallavecc...@webratio.com>
Gesendet: Donnerstag, 15. Dezember 2022 23:42
An: users@tomcat.apache.org
Betreff: HTTP/2 streams reset by Tomcat are treted as 503 errors by Apache
HTTP Server
Hello.
We are having a random issue in a setup with Tomcat deployed behind Apache
httpd. Some HTTP/2 requests proxied to Tomcat by httpd end with a
503 (service unavailable) response to the web browser client.
Tomcat version is 9.0.58 running with Java 1.8.0 in a Docker container.
Docker version is 18.09.2 and the host machine runs Ubuntu 18.04.
I only have access to httpd logs, because enabling Tomcat logs
(org.apache.coyote.http2.level = FINE) cause the problem to go away.
The following is an excerpt of a failing HTTP/2 stream, requested by httpd to
Tomcat.
[Thu Dec 15 21:21:25.202719 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(931): [remote 192.168.65.2:18095]
AH03363: h2_proxy_session(219-1289): submit localhost/app/143.js -> 1
[Thu Dec 15 21:21:25.202784 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(345): [remote 192.168.65.2:18095]
AH03343: h2_proxy_session(219-1289): sent FRAME[HEADERS[length=549,
hend=1, stream=1, eos=0]]
[Thu Dec 15 21:21:25.277438 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(219-1289): recv FRAME[HEADERS[length=120,
hend=1, stream=1, eos=0]]
[Thu Dec 15 21:21:25.277441 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(219-1289): recv FRAME[DATA[length=0, flags=1,
stream=1, padlen=0]]
[Thu Dec 15 21:21:25.277445 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(219-1289): recv FRAME[RST_STREAM[length=4,
flags=0, stream=1]]
[Thu Dec 15 21:21:25.277448 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(560): [remote 192.168.65.2:18095]
AH03360: h2_proxy_session(219-1289): stream=1, closed, err=8
[Thu Dec 15 21:21:25.277449 2022] [proxy_http2:debug] [pid 44:tid
139828875544320] h2_proxy_session.c(1352): [remote 192.168.65.2:18095]
AH03364: h2_proxy_sesssion(219-1289): stream(1) closed (touched=0,
error=8)
This is a similar excerpt from a successful HTTP/2 stream between the same
servers.
[Wed Dec 14 15:31:19.516817 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(931): [remote 192.168.65.2:18095]
AH03363: h2_proxy_session(212-197): submit localhost/app/1543.js -> 3
[Wed Dec 14 15:31:19.516849 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(345): [remote 192.168.65.2:18095]
AH03343: h2_proxy_session(212-197): sent FRAME[HEADERS[length=59,
hend=1, stream=3, eos=0]]
[Wed Dec 14 15:31:19.569569 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(212-197): recv FRAME[HEADERS[length=33,
hend=1, stream=3, eos=0]]
[Wed Dec 14 15:31:19.569575 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(212-197): recv FRAME[DATA[length=0, flags=1,
stream=3, padlen=0]]
[Wed Dec 14 15:31:19.569576 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(560): [remote 192.168.65.2:18095]
AH03360: h2_proxy_session(212-197): stream=3, closed, err=0
[Wed Dec 14 15:31:19.569577 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(1352): [remote 192.168.65.2:18095]
AH03364: h2_proxy_sesssion(212-197): stream(3) closed (touched=0,
error=0)
[Wed Dec 14 15:31:19.569753 2022] [proxy_http2:debug] [pid 42:tid
140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095]
AH03341: h2_proxy_session(212-197): recv FRAME[RST_STREAM[length=4,
flags=0, stream=3]]
In both cases, the request is a GET with "If-Modified-Since" header that should
result in a 304 response (not modified).
So, it is correct that there is no request nor response body.
The sequence of exchanged frames is
- httpd sends HEADERS frame,
- Tomcat sends HEADERS frame,
- Tomcat sends DATA frame with length 0 (correct) and with END_STREAM flag
(1),
- Tomcat sends RST_STREAM frame.
I think the problem is somehow related to the RST_STREAM frame.
In the failing case, it is read by httpd before it declares the stream closed,
and
the frame contained the CANCEL error code (error=8, err=8).
In the successful case, it is read by httpd after the stream is declared
closed, so
the log does not show its error code. Either the code is NO_ERROR (0) or it is
still CANCEL (8) but it is too late for httpd to change the outcome of the
stream,
which it declared error=0.
The cases when the stream ends with a non-zero code on httpd side are
precisely the ones that result in a 503 error for the client.
The problem looks very time-sensitive. I can only reproduce it with Tomcat
HTTP/2 logs disabled, without a Java debugger attached to Tomcat, and with
real network lag.
I created an automatic test case that uses a browser to download 20 files
concurrently and repeats every couple of seconds.
- With httpd and Tomcat on two different cloud machines, without Tomcat logs,
the problem happens after maximum 30 seconds.
- With httpd and Tomcat on my machine or Tomcat logs/debug enabled, I gave
up reproducing the problem after 10 hours.
It also appears that the problem is sensitive to the response being a
304 with no body.
If I disable caching, the problem becomes a lot harder to reproduce but still
possible (I'd say about 20 times less probable).
My guess is that the RST_STREAM frame sent by Tomcat always contains the
CANCEL code, and there is a race condition with httpd closing the stream.
When the RST_STREAM arrives before closing, httpd interprets the CANCEL
code as an error condition and reports that to the client with
503 error code.
I looked at the sources for the HTTP/2 connector and found this code around
line 100 of org.apache.coyote.http2.StreamProcessor.
StreamException se = new StreamException(
sm.getString("streamProcessor.cancel", stream.getConnectionId(),
stream.getIdAsString()), Http2Error.CANCEL, stream.getIdAsInt());
stream.close(se);
This seems to be the only place where the connector is sending a CANCEL
besides the forced closing of the connection in
org.apache.coyote.http2.Http2UpgradeHandler.close().
I don't know HTTP/2 deeply enough to say that sending CANCEL here is wrong,
also because I'm not sure I understand the situation that the connector is
addressing. However, could it be that the correct code here shold be
Http2Error.NO_ERROR?
I recompiled the class with that change, and my test case ran for 12 hours
without fail.
There is also this bug report for the Go http2 package:
https://github.com/golang/go/issues/19948
It looks pretty similar to the problem we see here (except that, in that case,
the
client is directly a browser).
The folks there conclude that the CANCEL error code is mostly intended for
clients and that servers should end "useless" streams with NO_ERROR.
They also quote a section of the RFC
(https://www.rfc-editor.org/rfc/rfc7540#section-8.1), but I am not sure if it
applies to the Tomcat case.
What do you think? Could this be a rare Tomcat bug, an httpd bug, or
something else?
Please let me know if you need more information.
Thanks in advance for your help.
Regards.
--
Lorenzo