On 17/01/2025 10:05, Michael wrote:
I have two applications, A and B, running in Tomcat 10.1.28 on RHEL
8.10 with Java 21 (OpenJDK Runtime Environment Red_Hat-21.0.5.0.10-1).
Application uses an AsyncContext and SSE to send messages back to an
application running in a Chrome browser. Sometimes I get an NPE with
the following stacktrace from app A:
04-Jan-2025 21:58:12.447 SEVERE [http-nio-8080-exec-11]
org.apache.catalina.core.Async
ContextImpl$RunnableWrapper.run Error
during processing of asynchronous Runnable via AsyncContext.start()
java.lang.NullPointerException: Cannot invoke
"org.apache.catalina.connector.OutputBuffer.setErrorException(java.lang.Exception)"
because "this.ob" is null
at
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:138)
at com.a.b.c.A$2.run(A.java:123)
at
org.apache.catalina.core.AsyncContextImpl$RunnableWrapper.run(AsyncContextImpl.java:562)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190)
at
org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
at java.base/java.lang.Thread.run(Thread.java:1583)
ob.flush() has thrown an IOE at line 136 but then
ob.setErrorException() has thrown an NPE.
ob must have been set to null between those calls.
That means there must have been call to OutputBuffer.clear()
That only gets called from o.a.catalina.connector.Response.recycle()
That gets called once an async request has been completed/dispatched
All of that suggests that something detects an issue with this request
(or it just times out) which triggers the async error handling which
eventually leads to the async request being completed/dispatched.
Meanwhile, an application thread is writing to the response. If that
thread doesn't see the error until after the request has been
completed/dispatched (or doesn't check for the error or ignores it) then
things are going to start going wrong.
Because Tomcat recycles and re-uses the request and response objects
(and quite a few others), if an application retains a reference to one
those objects and continues to use it after it has been recycled pretty
much anything can go wrong.
If the application uses non-blocking IO it gets even more complicated.
Generally, applications should (even must) always implement
AsyncListener. In particular, that means preventing further reads and/or
writes after onError() or onTimeout() fire. Where non-blocking IO is
used similar requirements exist for ReadListener and WriteListener.
In a development environment, I can reproduce an IOE in
CoyoteOutputStream.flush() if I in the browser switch to another tab
while my application A is trying to send a message to it. But I cannot
reproduce the following NPE shown above.
I've also tried to figure out why ob is set to null while in
CoyoteOutputStream.flush(), but that hasn't helped me much. Anyway,
I'm surprised to get an NPE from the platform.
In various logs from nightly(?) Tomcat test runs, I've been able to
find a similar problem. See this, for instance:
https://nightlies.apache.org/tomcat/tomcat-10.1.x/logs/1309/TEST-org.apache.coyote.http2.TestAsyncError.NIO.txt
I suspect the NPE occurs more often on the CI system because it is
relatively heavily loaded. We often see that with rare errors that are
timing dependent.
Now, the next thing that happens is even more surprising to me.
Sometimes the uncaught NPE triggers an HTTP 500 response on a request
to a servlet in application B! This I've been able to reproduce in my
development environment, if I simply throw an NPE in application A
from withing the AsyncContext as shown in the stacktrace above while I
simultaneously keep requests coming in to application B. Then it
sometimes happens. I haven't been able to see what's going on in
application B when this happens, but my trace logs don't show anything
suspicious. The client making requests to application B just get's
back this:
<!doctype html><html lang="en"><head><title>HTTP Status 500 – Internal
Server Error</title><style type="text/css">body
{font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b
{color:white;background-color:#525D76;} h1 {font-size:22px;} h2
{font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a
{color:black;} .line
{height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP
Status 500 – Internal Server Error</h1><hr class="line"
/><p><b>Type</b> Exception Report</p><p><b>Description</b> The server
encountered an unexpected condition that prevented it from fulfilling
the request.</p><p><b>Exception</b></p><pre>java.lang.NullPointerException
</pre><p><b>Note</b> The full stack trace of the root cause is
available in the server logs.</p><hr class="line" /><h3>Apache
Tomcat/10.1.28</h3></body></html>
Most likely because the application A thread has continued to use a
request/response that should have been recycled and reused for
application B. That said, I wouldn't rule out a Tomcat bug so it is
worth continuing to look at this.
Mark
I don't know much about all this, but isn't both the NPE and it's
effect on another application's requests surprising?
For now, I've worked around the problem by explicitly catching NPEs in
this async code in application A.
Kind regards,
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org