Dan,

You are doing well to keep up -- we keep throwing lots at you. I think we
all find this issue so bizarre we really want to figure it out. We are,
even if it doesn't look like it, slowly eliminating some potential causes.

I've put a few comments inline, and then a larger update on some testing I
was doing this morning to try to reproduce what you are observing in your
live system.

Questions / suggestions for Dan marked with @Dan

On Sat, Aug 16, 2025 at 2:31 PM Chuck Caldarale <n82...@gmail.com> wrote:

>
> > On 2025 Aug 16, at 12:44, Daniel Schwartz <d...@danielgschwartz.com>
> wrote:
> >
> > From: Chuck Caldarale <n82...@gmail.com>
> > Sent: Saturday, August 16, 2025 1:33 PM
> > To: Tomcat Users List <users@tomcat.apache.org>
> > Subject: Re: [EXTERNAL EMAIL] How to access a REST service
> >
> >> Immediately after doing this, the message copied below started
> appearing repeatedly in the server.log and my website stopped working---the
> DB evidently became inaccessible.  However, Glassfish was still running,
> and as soon as I set the maximum pool size back to 1000, these messages
> stopped coming and the website started working again.
> >
> > Indicative of an orphaned connection (resource leak).
> >
> > DGS: Why is this?  What do those messages have to do with a resource
> leak?
>
>
> When the pool is at its maximum and all connections are in use, GlassFish
> will hold requests for a configurable amount of time (max-wait, defaults to
> 60 seconds), and only abort the request and log the situation after the
> caller has been suspended for that amount of time. Since your web service’s
> response time is very quick, this indicates the only allocated connection
> has stayed in-use for over a minute - something leaked it.
>
> When the maximum pool size is set to your usual value, GlassFish
> eventually cleans things up because it periodically discards all existing
> connections and acquires new ones just in case they have become unusable
> from the DB point of view.
>

Just to add that there is another situation, but is unlikely based on the
default HTTP thread count maximums. It could be that you are getting a
large amount of connections faster than they can be processed -- i.e. let's
say 100 per 10 ms would overflow the connection pool if you had a limit of
50 and your average response time was 10 ms. This was why I asked about
your settings for HTTP thread limits.

I will check Glassfish 4.1 a bit later today, but with Glassfish 7.0.25
("out of the box") the limit on the HTTP processing threads was 5, and
without increasing that, it's not possible to have more than 5 typical
requests being processed at once. That means that if each request never
acquires more than 1 connection (which is the case in all the examples you
have provided), the connection pool usage should never grow beyond 5.

So, unless you have increased the HTTP processing thread limits from
defaults to a value larger than the default thread pool maximum size, or
the deployment you are using does such, you shouldn't have seen an issue
originally if all the connections are being released to the pool. The "peak
value", as tracked in Glassfish, should never exceed this thread count
unless a connection is not being returned to the pool.

Thus, the conclusion that there is a leak somewhere, but where... I'm not
sure at this point.
This is where I think we need to focus on -- possibly by dialing back your
settings, and also by looking at what's going on with the robot requests --
are they causing faults that aren't visible"?
(more on this at the end)

@Dan: It would be a really good idea though (as it's been said, and been
acknowledged) to wrap all the connection / statement / result-set
acquisitions with try-with-resources. Not only will it rule out leaks from
those going forward, it will also simplify your code and make it easier to
read. I would do this sooner so we can try to rule out leaks from those
activities.



>
> > So this is very interesting, and points to a possible line of
> investigation. If the web crawlers are throwing garbage at your app, that
> could possibly be resulting in losing track of connections, statements,
> result sets, etc. You should carefully check your error paths and make sure
> that all DB-related resources are being properly disposed of when invalid
> input is received.
> >
> > DGS: I'm fairly sure that all connections and prepared statements are
> being closed, but will look again.  This would be happening in the code
> fragment that tries to retrieve the list of holidays with calculated dates
> (not the countries).  This is somewhat complex and could have problems.
>
>
> Your GetCountryList code from a previous message doesn’t dispose of the
> ResultSet object; depending on the implementation details of the JDBC
> driver and pool, closing the result set may not be required, but it’s
> always good practice to do so. Since the ResultSet object contains a
> reference to the Connection (and vice-versa), it may be interfering with
> reuse of the connection until a garbage collection happens.
>


Okay, so while I was in between things this morning I did a whole bunch of
testing.

With Glassfish 7.0.25, with JDK 21, and the latest PostgreSQL JDBC driver,
I was unable to reproduce the "leak" problem using code very similar to
Dan's.

I tried all of the following:
- enabling and disabling the JDBC object wrapping in Glassfish
- using the "singleton" data source or retrieving one each request
- not closing the Statement or the ResultSet / or closing them
- not closing the Connection or closing it

The only situation where I could reproduce a leak (and very quickly) was to
fail to close the connection. (remove it from the try-with-resources
block). I could not get the lack of closing the ResultSet or the Statement
to trigger a "leak" in the connection pool -- but that doesn't mean one can
ignore that, only that it didn't definitely cause the issue in my test
setup.

I was not however running any "real queries". I was just running "SELECT 1"
each time and returning the integer to the caller. So no data was coming
from the client that could be triggering other potential failures.

My next plan is to switch to MySQL JDBC (as I noted this is what Dan is
using) and see if maybe there is something different with that JDBC driver
and do the same tests.

@Dan: Could you share the exact version of the JDBC driver you are using?

And then, I'll try rolling back to Glassfish 4.1.x and see if that makes a
difference.

@Dan: Is there a minor version, or is it 4.1.latest ?


Dan: it would also be a good idea to "verify" the parameters coming in for
"valid values" without using a DB query (if that is possible) so as to
avoid the robots triggering DB connections. Ultimately this is a risk of a
DoS type of attack. You can also employ rate limiting as well to help avoid
such problems.

@Dan: I think we also need to change your "printing to Stdout" to print to
the Glassfish logger. The problem with printing to Stdout is that only the
HTTP client making the request that triggers these logs will see them. This
could be "masking" some errors where you aren't doing the testing yourself.
(I could be wrong about where Stdout output goes though -- so if you do see
those in the logs, then ignore this item. I will validate that shortly
myself in Glassfish 7.0.25).

@Dan: Have you been able to get a rough count of the request rate you are
seeing on your server, so we have some idea if it's even possible to be
seeing an overload situation or if we can rule that out.


Robert
(who is clearly spending way too much time on Dan's issue, but is very
puzzled and wants to get to the bottom of it, and as a result has now used
Glassfish for the first time... :) )

Reply via email to