RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson

> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 10:19 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 16:08, Eric Robinson wrote:
>
> > I believe your assessment is correct. How hard is it to enable pooling? Can 
> > it
> be bolted on, so to speak, through changes to the app context, such that the
> webapp itself does not necessarily need to implement special code?
>
> It looks like - from the database configuration you provided earlier - there 
> is an
> option to configure the database via JNDI. If you do that with Tomcat you will
> automatically get pooling. That might be something to follow up with the
> vendor. If you go that route, I'd recommend configuring the pool to remove
> abandoned connections to avoid any issues with connection leaks.
>

In reviewing live threads with Visual VM, I note that there are apparently 
threads related to cleaning up abandoned connections, and maybe even pooling?

The threads are:

mysql-cj-abandoned-connection-cleanup (2 of those)
OkHttp Connection Pool (2 of those)
OkHttp https://ps.pndsn.com (not sure what that is)


> Not sure if all the web applications support a JNDI based configuration.
>
> 
>
> > Would the problem be relieved if the vendor stuck to one driver?
>
> Yes. That would avoid the attempt to load the "other" driver which is causing
> the delay.
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Hi Mark,


> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 10:10 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 13:38, Eric Robinson wrote:
> >> -Original Message-
> >> From: Mark Thomas 
>
> 
>
> >> I intend to wok on a patch for Tomcat that will add caching that
> >> should speed things up considerably. I hope to have something for
> >> Eric to test today but it might take me until tomorrow as I have a
> >> few other time critical things fighting to get tot he top of my TODO list 
> >> at the
> moment.
> >>
> >>
> >> Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib
> >> may also be a short-term fix but is likely to create problems if the
> >> same JAR ever exists in both locations at the same time.
>
> Just an FYI. On further reflection, moving the JDBC driver JARs isn't going to
> help. Sorry. You'll need my fix.
>
> Assuming, of course, you are willing to test a patch to address this on a
> production system.
>

Absolutely. We and the users are ready to do what it takes.

> > That's some great sleuthing and the explanation makes a ton of sense. It
> leaves me with a couple of questions.
> >
> > If you are correct, then it follows that historic activity has been hovering
> dangerously near the threshold where this symptom would manifest. Within the
> past month, an unknown change in the system climate now causes an uptick in
> the number of DB requests/second at roughly the same time daily (with
> occasional exceptions) and the system begins to trip over its own feet. I 
> haven't
> seen anything in my Zabbix graphs that stood out as potentially problematic.
> Armed with this information, I am now taking a closer look.
>
> Ack.
>
> > The natural next question is, what changed in the application or the users'
> workflow to push activity over the threshold? We'll dig into that.
>
> Could be all sorts of things.
>
> It might just have been coincidence the first time and now the users all 
> request
> the data they need at the start of their day in case the problem happens 
> again.
> And by doing that they cause the very problem they are trying to avoid.
>

One of the webapps is related to voice reminder messages that go out to people. 
The reminders go out sometime after 9 am, which tracks with the slowdowns.

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Mark,

A few other thoughts come to mind. See below.

> -Original Message-
> From: Eric Robinson 
> Sent: Wednesday, May 29, 2024 7:39 AM
> To: Tomcat Users List 
> Subject: RE: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Mark,
>
>
> > -Original Message-
> > From: Mark Thomas 
> > Sent: Wednesday, May 29, 2024 5:35 AM
> > To: users@tomcat.apache.org
> > Subject: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> > On 29/05/2024 10:26, Mark Thomas wrote:
> > > On 28/05/2024 16:26, Eric Robinson wrote:
> > >
> > > 
> > >
> > >> Took a bunch of thread and heap dumps during today's painful debacle.
> > >> Will send a link to those as soon as I can.
> > >
> > > Thanks. I have them. I have taken a look and I am starting to form a
> > > theory. To help with that I have a couple of questions.
> >
> > Scratch that. I've found some further information in the data Eric
> > sent me off- list and I am now pretty sure what is going on.
> >
> > There are multiple web applications deployed on the servers. I assume
> > there are related but it actually doesn't matter.
> >
> > At least one application is using the "new" MySQL JDBC driver:
> > com.mysql.cj.jdbc.Driver
> >
> > At least one application is using the "old" MySQL JDBC driver:
> > com.mysql.jdbc.Driver
> >
> >
> > (I've told Eric off-list which application is using which).
> >
> > There are, therefore, two drivers registered with the
> > java.sql.DriverManager
> >
> >
> > The web applications are not using connection pooling. Or, if they are
> > using it, they are using it very inefficiently. The result is that
> > there is a high volume of calls to create new database connections.
> >
> > This is problem number 1. Creating a database connection is expensive.
> > That is why the concept of database connection pooling was created.
> >
> >
> > When a new connection is created, java.sql.DriverManager iterates over
> > the list of registered drivers and
> > - tests to see if the current class loader can see the driver
> > - if yes, tests to see if that driver can service the connection url
> > - if yes, use it and exit
> > - go on to the next driver in the list and repeat
> >
> > The test to see if the current class loader can use the driver is,
> > essentially, to call Class.forName(driver.getClass(), true,
> > classloader)
> >
> > And that is problem number 2. That check is expensive if the current
> > class loader can't load that driver.
> >
> >
> > It is also problem number 3. The reason it is expensive is that class
> > loaders don't cache misses so if a web application has a large number
> > of JARs, they all get scanned every time the DriverManager tries to
> > create a new connection.
> >

Maybe a potential solution is to have the class loader cache misses? Wait, I 
see you answered that further down...

> >
> > The slowness occurs in the web application that depends on the second
> > JDBC driver in DriverManager's list. When a request that requires a
> > database connection is received, there is a short delay while the web
> > application tries, and fails, to load the first JDBC driver in the list.
> > Class loading is synchronized on class name being loaded so if any
> > other requests also need a database connection, they have to wait for
> > this request to finish the search for the JDBC driver before they can
> > continue. This creates a bottleneck. Requests are essentially rate
> > limited to 1 request that requires a database connection per however
> > long it takes to scan every JAR in the web application for a class
> > that isn't there. If the average rate of requests exceeds this rate
> > limit then a queue is going to build up and it won't subside until the
> > average rate of requests falls below this rate limit.
> >
> >
> >
> > Problem number 1 is an application issue. It should be using pooling.
> > It seems unlikely that we'll see a solution from the application
> > vendor and
> > - even if the vendor does commit to a fix - I suspect it will take months.
> >

I believe your assessment is correct. How hard is it to enable pooling? Can it 
be bolted on, so to speak, through changes to the app context, such that the 
webapp itself does not necessarily need to implement special code?

> >
> > Problem number 2 is a JRE issue. I think there a

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-29 Thread Eric Robinson
Hi Mark,


> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, May 29, 2024 5:35 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> On 29/05/2024 10:26, Mark Thomas wrote:
> > On 28/05/2024 16:26, Eric Robinson wrote:
> >
> > 
> >
> >> Took a bunch of thread and heap dumps during today's painful debacle.
> >> Will send a link to those as soon as I can.
> >
> > Thanks. I have them. I have taken a look and I am starting to form a
> > theory. To help with that I have a couple of questions.
>
> Scratch that. I've found some further information in the data Eric sent me 
> off-
> list and I am now pretty sure what is going on.
>
> There are multiple web applications deployed on the servers. I assume there 
> are
> related but it actually doesn't matter.
>
> At least one application is using the "new" MySQL JDBC driver:
> com.mysql.cj.jdbc.Driver
>
> At least one application is using the "old" MySQL JDBC driver:
> com.mysql.jdbc.Driver
>
>
> (I've told Eric off-list which application is using which).
>
> There are, therefore, two drivers registered with the java.sql.DriverManager
>
>
> The web applications are not using connection pooling. Or, if they are using 
> it,
> they are using it very inefficiently. The result is that there is a high 
> volume of
> calls to create new database connections.
>
> This is problem number 1. Creating a database connection is expensive.
> That is why the concept of database connection pooling was created.
>
>
> When a new connection is created, java.sql.DriverManager iterates over the 
> list
> of registered drivers and
> - tests to see if the current class loader can see the driver
> - if yes, tests to see if that driver can service the connection url
> - if yes, use it and exit
> - go on to the next driver in the list and repeat
>
> The test to see if the current class loader can use the driver is, 
> essentially, to
> call Class.forName(driver.getClass(), true, classloader)
>
> And that is problem number 2. That check is expensive if the current class
> loader can't load that driver.
>
>
> It is also problem number 3. The reason it is expensive is that class
> loaders don't cache misses so if a web application has a large number of
> JARs, they all get scanned every time the DriverManager tries to create
> a new connection.
>
>
> The slowness occurs in the web application that depends on the second
> JDBC driver in DriverManager's list. When a request that requires a
> database connection is received, there is a short delay while the web
> application tries, and fails, to load the first JDBC driver in the list.
> Class loading is synchronized on class name being loaded so if any other
> requests also need a database connection, they have to wait for this
> request to finish the search for the JDBC driver before they can
> continue. This creates a bottleneck. Requests are essentially rate
> limited to 1 request that requires a database connection per however
> long it takes to scan every JAR in the web application for a class that
> isn't there. If the average rate of requests exceeds this rate limit
> then a queue is going to build up and it won't subside until the average
> rate of requests falls below this rate limit.
>
>
>
> Problem number 1 is an application issue. It should be using pooling. It
> seems unlikely that we'll see a solution from the application vendor and
> - even if the vendor does commit to a fix - I suspect it will take months.
>
>
> Problem number 2 is a JRE issue. I think there are potentially more
> efficient ways to perform that check but that needs research as things
> like OSGI and JPMS make class loading more complicated.
>
>
> Problem number 3 is a Tomcat issue. It should be relatively easy to
> start caching misses (i.e. this class loader cannot load this class) and
> save the time spent repeatedly scanning JARs for a class that isn't there.
>
>
> I intend to wok on a patch for Tomcat that will add caching that should
> speed things up considerably. I hope to have something for Eric to test
> today but it might take me until tomorrow as I have a few other time
> critical things fighting to get tot he top of my TODO list at the moment.
>
>
> Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib may
> also be a short-term fix but is likely to create problems if the same
> JAR ever exists in both locations at the same time.
>
>
> Mark
>

That's some great sleuthing and the explanation makes a ton of sense. It leaves 
me with a couple of questio

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Eric Robinson
Hi Mark,

See comments below.


> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, May 28, 2024 9:32 AM
> To: Tomcat Users List 
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Eric,
>
> Follow-up observsations and comments in-line.
>
> >> What time does this problem start?
> >
> > It typically starts around 9:15 am EDT and goes until around 10:30 am.
>
> Does that match the time of highest request load from the customer?
> Rather than a spike, I'm wondering if the problem is triggered once load
> exceeds some threshold.
>

My nginx proxy console only shows live activity and does not keep a history, 
but I can probably script something to parse the localhost_access logs and 
graph request counts on a per-minute basis. Will work on that.

> > We finished and implemented the script yesterday, so today will be the first
> day that it produces results. It watches the catalina.out file for stuck 
> thread
> detection warnings. When the number of stuck threads exceeds a threshold,
> then it starts doing thread dumps every 60 seconds until the counts drops back
> down below the threshold. The users typically do not complain of slowness 
> until
> the stuck thread count exceeds 20, and during that time the threads often take
> up to a minute or more to complete. It's too late today to change the timings,
> but if it does not produce any actionable intel, we can adjust them tonight.
>
> Lets see what that produces and go from there.
>

Took a bunch of thread and heap dumps during today's painful debacle. Will send 
a link to those as soon as I can.

> > The vendor claims that the feature uses a different server and does not send
> requests to the slow ones, so it has been re-enabled at the customer's 
> request.
> We may ask them to disable it again until we get this issue resolved.
>
> Noted.
>
> > This customer sends about 1.5 million requests to each load-balanced
> > server during a typical production day. Most other customers send much
> > less, often only a fraction of that. However, at least one customer
> > sends about 2 million to the same server, and they don't see the
> > problem. (I will check if they have the AI feature enabled.)
>
> Hmm. Whether that other customer has the AI feature enabled would be an
> interesting data point.

I will ask them right after I send this message. They are usually a little slow 
to respond.

>
> >> Can we see the full stack trace please.
> >
> > Here's one example.
>
> 
>
> >  java.lang.Throwable
> >  at
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoa
> derBase.java:1252)
> >  at
> > org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClass
> > LoaderBase.java:1220)
>
> 
>
> That is *very* interesting. That is the start of a synch block in the class 
> loader. It
> should complete quickly. The full thread dump should tell us what is holding 
> the
> lock. If we are lucky we'll be able to tell why the lock is being held for so 
> long.
>
> We might need to reduce the time between thread dumps to figure out what
> the thread that is blocking everything is doing. We'll see.
>
> > The app has DB connection details in two places. First, it uses a database
> connection string in a .properties file, as follows. This string handles most
> connections to the DB.
> >
> > mobiledoc.DBUrl=jdbc:mysql://ha52a:5791
> >
> mobiledoc.DBName=mobiledoc_791?useSSL=false=rou
> nd
> >
> =false=true
> > esOnException=true=false=tru
> > e=true
> > mobiledoc.DBUser=
> > mobiledoc.DBPassword=
>
> OK. That seems unlikely to be using connection pooling although the 
> application
> might be pooling internally.
>

Based on lots of previous observation, I don't think they are. The comms 
between the app and DB are choppy, with only about 1-5 queries per TCP 
connection. If they are pooling, they are not doing it aggressively.

> > It also has second DB config specifically for a drug database.
> >
> > 
> >
> >
> >  
> >  
> >  
> >  c:\out.log
> >
> >
> >
> >
> >
> >  
> >
> INSERT_CONTEXT_FACTORY FACTORY>
> >  INSERT_JNDI_URL
> >  INSERT_USER_NAME
> >  INSERT_PASSWORD
> >  INSERT_LOOKUP_NAME
> >  com.mysql.jdbc.Driver
> >
> jdbc:mysql://dbclust54:5791/medispan?sessionVariables=wait_timeout=2
> 8800,interactive_timeout=28800
> >  redacted
> >  redacted
> >  10
> >  5000
> >
> >
> >
> >
> >  true
> >  0
> >  1800
> >
> > 
>
> Hmm. There is a pool size setting there but we can't tell if it is being used.
>
> >> Is that Tomcat 9.0.80 as provided by the ASF?
>
> An explicit answer to this question would be helpful.
>

Didn't mean to seem evasive. Yes, it's from the ASF.


> In terms of the way forward, we need to see to thread dumps when the problem
> is happening to figure out where the blockage is happening and
> (hopefully) why.
>
> Mark

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-28 Thread Eric Robinson
Hi Mark,

> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, May 28, 2024 3:42 AM
> To: users@tomcat.apache.org
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Eric,
>
> I have a some follow-up questions in-line. I have also read the other 
> messages in
> this thread and added a couple of additional questions based on what I read in
> those threads.
>
>
> On 26/05/2024 02:58, Eric Robinson wrote:
> > One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day.
>
> What time does this problem start?
>

It typically starts around 9:15 am EDT and goes until around 10:30 am.

> Does it occur every day of the week including weekends?
>

Most weekdays. There have been 1 or 2 weekdays when it seems that symptom 
inexplicably did not appear. I'm not sure about weekends, as the medical 
practice does not work on those days.

> How does the slowness correlate to:
> - request volume
> - requests to any particular URL(s)?
> - requests from any particular client IP?
> - any other attribute of the request?
>

> (I'm trying to see if there is something about the requests that triggers the
> issue.)
>

We have not seen anything stand out. There are no apparent spikes in request 
volume. The slowness appears to impact all parts of the system (meaning all 
URLs). It manifests for the customer, but we have also seen it when we connect 
to the app internally, behind the firewall and reverse proxy, directly to the 
tomcat server from a workstation connected to the same switch.

> > During the slow times, we've done all the usual troubleshooting to catch the
> problem in the act. The servers have plenty of power and are not overworked.
> There are no slow database queries. Network connectivity is solid. Tomcat has
> plenty of memory. The numbers of database connections, threads, questions,
> queries, etc., remain steady, without spikes. There is no unusual disk 
> latency.
> We have not found any maintenance tasks running during that timeframe.
>
> I would usually suggest taking three thread dumps approximately 5s apart and
> then diffing them to try and spot "slow moving" threads.
>

> I see you have scripted trigger a thread dump when the slowness hits. If you
> haven't already, please configure it to capture (at least) 3 dumps
> ~5 seconds apart.
>
> (If we can spot the slow moving threads we might be able to identify what it 
> is
> that makes them slow moving.)
>

We finished and implemented the script yesterday, so today will be the first 
day that it produces results. It watches the catalina.out file for stuck thread 
detection warnings. When the number of stuck threads exceeds a threshold, then 
it starts doing thread dumps every 60 seconds until the counts drops back down 
below the threshold. The users typically do not complain of slowness until the 
stuck thread count exceeds 20, and during that time the threads often take up 
to a minute or more to complete. It's too late today to change the timings, but 
if it does not produce any actionable intel, we can adjust them tonight.

> > The customer has another load-balanced tomcat instance on a different
> physical server, and the problem happens on that one, too. The servers were
> upgraded with a new kernel and packages on 4/5/24, but the issue did not
> appear until 5/6/24. The vendor enabled a new feature in the customer's
> software, and the problem appeared the next day, but they subsequently
> disabled the feature, and (reportedly) the problem did not go away.
>
> Have you confirmed that the feature really is disabled? Or was it just hidden?
>

The vendor claims that the feature uses a different server and does not send 
requests to the slow ones, so it has been re-enabled at the customer's request. 
We may ask them to disable it again until we get this issue resolved.

> Has this feature been enabled for any other customers? If yes, have they
> experienced similar issues?
>

> (It is suspicious that the issue occurred after the feature was disabled. I 
> wonder
> if some elements of that change (e.g. a database
> change) are still in place and causing issues.)
>

We agree that it is suspicious, but at this point we are forced to give it the 
side-eye. We're not aware of other customers being impacted, but (a) it's a new 
AI-based feature, so not many other customers have it, (b) it is enabled by the 
vendor directly, so we are not in the notification loop, and (c) the problem 
customer is large, with about 800 staff, whereas most other customers are much 
and might not trigger the symp

RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Chuck,

> -Original Message-
> From: Chuck Caldarale 
> Sent: Sunday, May 26, 2024 2:21 PM
> To: Tomcat Users List 
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
>
> > On May 25, 2024, at 20:58, Eric Robinson  wrote:
> >
> > One of our hosting customers is a medical practice using a commercial EMR
> running on tomcat+mysql. It has operated well for over a year, but users have
> suddenly begun experiencing slowness for about an hour at the same time
> every day. During the slow times, we've done all the usual troubleshooting to
> catch the problem in the act. The servers have plenty of power and are not
> overworked. There are no slow database queries. Network connectivity is solid.
> Tomcat has plenty of memory. The numbers of database connections, threads,
> questions, queries, etc., remain steady, without spikes. There is no unusual 
> disk
> latency. We have not found any maintenance tasks running during that
> timeframe.
>
>
> 
>
>
> > There are no unusual errors in the tomcat or database server logs, EXCEPT
> this one: Java.sql.DriverManager.getConnection
>
>
> 
>
>
> > During the periods of slowness, we see lots of those errors along with a 
> > large
> spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). 
> It
> seems obvious that the threads are stuck because tomcat is waiting on a
> connection to the database.
>
>
> 
>
>
> > We are forced to conclude that some database connection requests are being
> initiated but are not being sent on the wire.
>
>
> Could the DB server be out of ports? (Seems unlikely, based on your debugging
> so far.)
>

We have not seen any indication of that.

> Any chance that the Tomcat process is running out of file descriptors? Or 
> ports?
>

Likewise, no indications of that.

> Can you force a garbage collection (e.g., with jconsole or similar tool) 
> during a
> slow period? If there is some limit on an OS-level resource that’s being 
> reached,
> a GC may be able to delete the Java objects that are tying up the underlying
> resources.

GC is on my list of things to try.

>
>   - Chuck
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Thomas,


> -Original Message-
> From: Thomas Hoffmann (Speed4Trade GmbH)
> 
> Sent: Sunday, May 26, 2024 3:30 PM
> To: Tomcat Users List 
> Subject: AW: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hello,
>
> > -Ursprüngliche Nachricht-
> > Von: Chuck Caldarale 
> > Gesendet: Sonntag, 26. Mai 2024 21:21
> > An: Tomcat Users List 
> > Betreff: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> >
> > > On May 25, 2024, at 20:58, Eric Robinson 
> wrote:
> > >
> > > One of our hosting customers is a medical practice using a
> > > commercial EMR
> > running on tomcat+mysql. It has operated well for over a year, but
> > users have suddenly begun experiencing slowness for about an hour at
> > the same time every day. During the slow times, we've done all the
> > usual troubleshooting to catch the problem in the act. The servers
> > have plenty of power and are not overworked. There are no slow database
> queries. Network connectivity is solid.
> > Tomcat has plenty of memory. The numbers of database connections,
> > threads, questions, queries, etc., remain steady, without spikes.
> > There is no unusual disk latency. We have not found any maintenance
> > tasks running during that timeframe.
> >
> >
> > 
> >
> >
> > > There are no unusual errors in the tomcat or database server logs,
> > > EXCEPT
> > this one: Java.sql.DriverManager.getConnection
> >
> >
> > 
> >
> >
> > > During the periods of slowness, we see lots of those errors along
> > > with a large
> > spike in the number of stuck tomcat threads (from 1 or 2 to as high as
> > 100). It seems obvious that the threads are stuck because tomcat is
> > waiting on a connection to the database.
> >
> >
> > 
> >
> >
> > > We are forced to conclude that some database connection requests are
> > > being
> > initiated but are not being sent on the wire.
> >
> >
> > Could the DB server be out of ports? (Seems unlikely, based on your
> > debugging so far.)
> >
> > Any chance that the Tomcat process is running out of file descriptors? Or
> ports?
> >
> > Can you force a garbage collection (e.g., with jconsole or similar
> > tool) during a slow period? If there is some limit on an OS-level
> > resource that’s being reached, a GC may be able to delete the Java
> > objects that are tying up the underlying resources.
> >
> >   - Chuck
> >
>
>
> On the client side, the TCP connections are kept in a wait-state for usually 2
> minutes as far as I know.
> Maybe you can check how many are in this state.
>

On our server, we set things much lower to allow faster recycling of TCP 
connections...

net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1

> If the application doesn’t use connection pooling, then this can be the 
> problem
> itself too.

During peak production, there are a total of around 20,000 connections in 
various states, mostly TIME_WAIT. The port range is 5000-6.
Dmesg, journalcrl, and the messages file don't show any errors about running 
out of ports or file handles.


> TCP handshakes and logon process take a while and for performance reasons,
> DB connections are usually pooled.
>
> A stacktrace might help to see what java is doing when it enters this blocking
> state.
> Maybe you can provide a stack when the app starts blocking.
>

We are writing a script to watch for stuck threads to exceed a threshold, and 
do a thread dump when that happens.

> Greetings,
> Thomas
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-26 Thread Eric Robinson
Hi Thomas,


> -Original Message-
> From: Thomas Hoffmann (Speed4Trade GmbH)
> 
> Sent: Sunday, May 26, 2024 2:52 AM
> To: Tomcat Users List 
> Subject: AW: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hello Eric,
>
> > -Ursprüngliche Nachricht-
> > Von: Eric Robinson 
> > Gesendet: Sonntag, 26. Mai 2024 03:59
> > An: users@tomcat.apache.org
> > Betreff: Database Connection Requests Initiated but Not Sent on the
> > Wire (Some, Not All)
> >
> > One of our hosting customers is a medical practice using a commercial
> > EMR running on tomcat+mysql. It has operated well for over a year, but
> > users have suddenly begun experiencing slowness for about an hour at
> > the same time every day. During the slow times, we've done all the
> > usual troubleshooting to catch the problem in the act. The servers
> > have plenty of power and are not overworked. There are no slow database
> queries. Network connectivity is solid.
> > Tomcat has plenty of memory. The numbers of database connections,
> > threads, questions, queries, etc., remain steady, without spikes.
> > There is no unusual disk latency. We have not found any maintenance
> > tasks running during that timeframe.
> >
> > The customer has another load-balanced tomcat instance on a different
> > physical server, and the problem happens on that one, too. The servers
> > were upgraded with a new kernel and packages on 4/5/24, but the issue
> > did not appear until 5/6/24. The vendor enabled a new feature in the
> > customer's software, and the problem appeared the next day, but they
> > subsequently disabled the feature, and (reportedly) the problem did
> > not go away. It is worth mentioning that the servers are
> > multi-tenanted, with other customers running the same medical
> > application, but the others do not experience the slowdowns, even though
> they are on the same servers.
> >
> > There are no unusual errors in the tomcat or database server logs,
> > EXCEPT this
> > one: Java.sql.DriverManager.getConnection
> >
> > During the periods of slowness, we see lots of those errors along with
> > a large spike in the number of stuck tomcat threads (from 1 or 2 to as
> > high as 100). It seems obvious that the threads are stuck because
> > tomcat is waiting on a connection to the database. However, tcpdump
> > shows that connectivity to the database is perfect at the network and
> > application layers. There are no unanswered SYNs, no retransmissions,
> > no half-open connections, no failures to allocate TCP ports, no
> > conntrack messages, and no other indications of system resource
> > exhaustion. Every time tomcat requests a connection to the DB, it
> > completes in less than 1 ms. Ten thousand connection attempts completed
> successfully in about 15 seconds, with zero failures.
> >
> > We are forced to conclude that some database connection requests are
> > being initiated but are not being sent on the wire. The problem seems
> > to be in the interaction between tomcat and the database driver, or in the
> driver itself.
> > Unfortunately, the application vendor is taking the "it's your 
> > infrastructure"
> > position without providing any evidence or offering suggestions for
> > configuration changes, other than to deploy more tomcat instances,
> > which is just shooting in the dark. They don't know why the software
> > is throwing java.sql.DriverManager.getConnection errors (even though
> > it's their code), and they've relegated the investigation to us.
> >
> > Any advice from the community would be greatly appreciated.
> >
> > RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64 Apache Tomcat/9.0.80,
> > JVM
> > 1.8.0_372-b07
> >
> > (The tomcat and JVM versions are the ones recommended by the vendor.)
> >
> > We're standing by to provide whatever other information the community
> > may need.
> >
> > Thanks tons!
> >
> > -Eric
>
> The database connections are usually pooled.
> If the pool is exhausted, the thread will wait till a connection is returned 
> to the
> pool which can be reused.
> Do you use connection pooling?
> How does the configuration look like?
> Do you monitor the pool usage?
>
> In general, it doesn’t look like a Tomcat issue per sé.
>
> Greetings,
> Thomas
>

I have asked the vendor that question several times, but their technicians have 
never provided a clear answer. Most of the time they have not even understood 
the question. If pooling were enabled, I would expect to see maxTotal 

Database Connection Requests Initiated but Not Sent on the Wire (Some, Not All)

2024-05-25 Thread Eric Robinson
One of our hosting customers is a medical practice using a commercial EMR 
running on tomcat+mysql. It has operated well for over a year, but users have 
suddenly begun experiencing slowness for about an hour at the same time every 
day. During the slow times, we've done all the usual troubleshooting to catch 
the problem in the act. The servers have plenty of power and are not 
overworked. There are no slow database queries. Network connectivity is solid. 
Tomcat has plenty of memory. The numbers of database connections, threads, 
questions, queries, etc., remain steady, without spikes. There is no unusual 
disk latency. We have not found any maintenance tasks running during that 
timeframe.

The customer has another load-balanced tomcat instance on a different physical 
server, and the problem happens on that one, too. The servers were upgraded 
with a new kernel and packages on 4/5/24, but the issue did not appear until 
5/6/24. The vendor enabled a new feature in the customer's software, and the 
problem appeared the next day, but they subsequently disabled the feature, and 
(reportedly) the problem did not go away. It is worth mentioning that the 
servers are multi-tenanted, with other customers running the same medical 
application, but the others do not experience the slowdowns, even though they 
are on the same servers.

There are no unusual errors in the tomcat or database server logs, EXCEPT this 
one: Java.sql.DriverManager.getConnection

During the periods of slowness, we see lots of those errors along with a large 
spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It 
seems obvious that the threads are stuck because tomcat is waiting on a 
connection to the database. However, tcpdump shows that connectivity to the 
database is perfect at the network and application layers. There are no 
unanswered SYNs, no retransmissions, no half-open connections, no failures to 
allocate TCP ports, no conntrack messages, and no other indications of system 
resource exhaustion. Every time tomcat requests a connection to the DB, it 
completes in less than 1 ms. Ten thousand connection attempts completed 
successfully in about 15 seconds, with zero failures.

We are forced to conclude that some database connection requests are being 
initiated but are not being sent on the wire. The problem seems to be in the 
interaction between tomcat and the database driver, or in the driver itself. 
Unfortunately, the application vendor is taking the "it's your infrastructure" 
position without providing any evidence or offering suggestions for 
configuration changes, other than to deploy more tomcat instances, which is 
just shooting in the dark. They don't know why the software is throwing 
java.sql.DriverManager.getConnection errors (even though it's their code), and 
they've relegated the investigation to us.

Any advice from the community would be greatly appreciated.

RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64
Apache Tomcat/9.0.80, JVM 1.8.0_372-b07

(The tomcat and JVM versions are the ones recommended by the vendor.)

We're standing by to provide whatever other information the community may need.

Thanks tons!

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Can We Disable Chunked Encoding?

2023-07-24 Thread Eric Robinson
My apologies. I wasn't aware that something else besides the subject line 
identifies a thread. I thought changing the subject line *IS* starting a new 
thread. Thanks for letting me know. For my own edification, what does the list 
look for in a message to identify the thread?

> -Original Message-
> From: Mark Thomas 
> Sent: Thursday, July 6, 2023 3:13 AM
> To: users@tomcat.apache.org
> Subject: Re: Can We Disable Chunked Encoding?
>
> Please don't hijack threads by replying to a previous message and changing
> the subject. Start a new thread by sending a new message to the list.
>
> You also need to provide some version information.
>
> Mark
>
>
> On 06/07/2023 00:36, Eric Robinson wrote:
> > We've been seeing problems with failed requests where the response comes
> back with duplicate chunked encoding headers:
> >
> > [Response]
> >
> > HTTP/1.1 200
> > Strict-Transport-Security: max-age=86400; includeSubDomains;
> > Cache-Control: no-cache,no-store
> > isAuthenticated: true
> > X-FRAME-OPTIONS: SAMEORIGIN
> > Transfer-Encoding: chunked  <<<<<<<<<<<<<
> > X-XSS-Protection: 1; mode=block
> > vary: accept-encoding
> > Content-Encoding: gzip
> > Content-Type: text/xml;charset=ISO-8859-1
> > Transfer-Encoding: chunked  <<<<<<<<<<<<<< Duplicate
> > Date: Wed, 05 Jul 2023 17:22:11 GMT
> >
> > This is a violation of RFC 7230, so our nginx proxy is dropping the request
> and returning a 502 bad gateway error. I've spoken to F5 about this, and
> there's no way to make nginx ignore this violation. Unfortunately, the app is 
> a
> canned product, and we don't have access to the code.
> >
> > Is there a way to disable that behavior in Tomcat?
> >
> > -Eric
> >
> >
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure no
> viruses are present in this email, the company cannot accept responsibility 
> for
> any loss or damage arising from the use of this email or attachments.
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Can We Disable Chunked Encoding?

2023-07-05 Thread Eric Robinson
We've been seeing problems with failed requests where the response comes back 
with duplicate chunked encoding headers:

[Response]

HTTP/1.1 200
Strict-Transport-Security: max-age=86400; includeSubDomains;
Cache-Control: no-cache,no-store
isAuthenticated: true
X-FRAME-OPTIONS: SAMEORIGIN
Transfer-Encoding: chunked  <
X-XSS-Protection: 1; mode=block
vary: accept-encoding
Content-Encoding: gzip
Content-Type: text/xml;charset=ISO-8859-1
Transfer-Encoding: chunked  << Duplicate
Date: Wed, 05 Jul 2023 17:22:11 GMT

This is a violation of RFC 7230, so our nginx proxy is dropping the request and 
returning a 502 bad gateway error. I've spoken to F5 about this, and there's no 
way to make nginx ignore this violation. Unfortunately, the app is a canned 
product, and we don't have access to the code.

Is there a way to disable that behavior in Tomcat?

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2022-01-03 Thread Eric Robinson
Hi Chris --

We have access to the configuration files, but not the source code. There is no 
"pool" reference in server.xml or any of the context.xml files. However, I did 
receive a call from a vendor tech this morning and they are exploring the 
question right now and will get back to me soon hopefully.

> -Original Message-
> From: Christopher Schultz 
> Sent: Monday, January 3, 2022 9:10 AM
> To: users@tomcat.apache.org
> Subject: Re: Do I Need Network NameSpaces to Solve This
> Tomcat+Connector/J Problem?
>
> Eric,
>
> On 12/30/21 19:03, Eric Robinson wrote:
> > If I want to ignore the vendor's recommendation and try connection
> > pooling anyway, is that something I can enable with a config file
> > setting, or do they actually have to trigger it from within their
> > code?
> That depends upon how they are obtaining database connections. If they are
> using the driver directly and NOT using a pool (why would they use a pool if
> they have a policy NOT to use it?) then there is likely nothing you can do.
>
> Are you able to look at the code? Are you able to look at the configuration?
> Specifically, the META-INF/context.xml file in the application and
> conf/server.xml for the server.
>
> If we can find a "pool" configuration in there, it's possible it just has 
> insane
> limits like maxActive="10" or something like that.
>
> -chris
>
> >> -Original Message-
> >> From: Eric Robinson 
> >> Sent: Thursday, December 30, 2021 12:00 PM
> >> To: Tomcat Users List 
> >> Subject: RE: Do I Need Network NameSpaces to Solve This
> >> Tomcat+Connector/J Problem?
> >>
> >> Chris,
> >>
> >>> Not pooling connections will very likely negatively affect performance.
> >>>
> >>> When you say "they ... have an issue with connection pooling" do you
> >>> mean that they have a technical problem, or do you mean that there
> >>> is some ill- conceived policy against them?
> >>>
> >>> Oh, maybe they are paranoid about cross-client leakage between
> >>> connections. Well, if the application can't be trusted not to leak
> >>> that kind of info, then it can't be trusted to make the connections
> >>> properly in the first place.
> >>>
> >>> -chris
> >>>
> >>
> >> Hard to say what their issue is. We've asked about implementing it
> >> before, but they don't support it. You know how software companies
> >> are. Maybe they had a technical problem with it years ago and have just
> not revisited it.
> >> They're stuck in a rut and there is too much inertia to get them out of it.
> >>
> >> --Eric
> >>
> >>
> >>
> >>
> >> Disclaimer : This email and any files transmitted with it are
> >> confidential and intended solely for intended recipients. If you are
> >> not the named addressee you should not disseminate, distribute, copy
> >> or alter this email. Any views or opinions presented in this email
> >> are solely those of the author and might not represent those of
> >> Physician Select Management. Warning: Although Physician Select
> >> Management has taken reasonable precautions to ensure no viruses are
> >> present in this email, the company cannot accept responsibility for any
> loss or damage arising from the use of this email or attachments.
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you a

RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Hi Rob,

> > On Dec 30, 2021, at 4:03 PM, Eric Robinson 
> wrote:
> >
> > Chris,
> >
> > If I want to ignore the vendor's recommendation and try connection
> pooling anyway, is that something I can enable with a config file setting, or 
> do
> they actually have to trigger it from within their code?
> >
>
> Up thread, didn’t you say tomcat was the client?  Are servlets in tomcat
> making db requests?  What database system is under this?
> >

Yes, tomcat is the client and the database is MySQL. There are, in fact, many 
tomcat instances on the same server, each connecting to its own dedicated MySQL 
database located in a farm of MySQL servers.

> >> -Original Message-
> >> From: Eric Robinson  >> <mailto:eric.robin...@psmnv.com>>
> >> Sent: Thursday, December 30, 2021 12:00 PM
> >> To: Tomcat Users List  >> <mailto:users@tomcat.apache.org>>
> >> Subject: RE: Do I Need Network NameSpaces to Solve This
> >> Tomcat+Connector/J Problem?
> >>
> >> Chris,
> >>
> >>> Not pooling connections will very likely negatively affect performance.
> >>>
> >>> When you say "they ... have an issue with connection pooling" do you
> >>> mean that they have a technical problem, or do you mean that there
> >>> is some ill- conceived policy against them?
> >>>
> >>> Oh, maybe they are paranoid about cross-client leakage between
> >>> connections. Well, if the application can't be trusted not to leak
> >>> that kind of info, then it can't be trusted to make the connections
> >>> properly in the first place.
> >>>
> >>> -chris
> >>>
> >>
> >> Hard to say what their issue is. We've asked about implementing it
> >> before, but they don't support it. You know how software companies
> >> are. Maybe they had a technical problem with it years ago and have just
> not revisited it.
> >> They're stuck in a rut and there is too much inertia to get them out of it.
> >>
> >> --Eric
> >>
> >>
> >>
> >>
> >> Disclaimer : This email and any files transmitted with it are
> >> confidential and intended solely for intended recipients. If you are
> >> not the named addressee you should not disseminate, distribute, copy
> >> or alter this email. Any views or opinions presented in this email
> >> are solely those of the author and might not represent those of
> >> Physician Select Management. Warning: Although Physician Select
> >> Management has taken reasonable precautions to ensure no viruses are
> >> present in this email, the company cannot accept responsibility for any
> loss or damage arising from the use of this email or attachments.
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > <mailto:users-unsubscr...@tomcat.apache.org>
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> > <mailto:users-h...@tomcat.apache.org>
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Chris,

If I want to ignore the vendor's recommendation and try connection pooling 
anyway, is that something I can enable with a config file setting, or do they 
actually have to trigger it from within their code?


> -Original Message-
> From: Eric Robinson 
> Sent: Thursday, December 30, 2021 12:00 PM
> To: Tomcat Users List 
> Subject: RE: Do I Need Network NameSpaces to Solve This
> Tomcat+Connector/J Problem?
>
> Chris,
>
> > Not pooling connections will very likely negatively affect performance.
> >
> > When you say "they ... have an issue with connection pooling" do you
> > mean that they have a technical problem, or do you mean that there is
> > some ill- conceived policy against them?
> >
> > Oh, maybe they are paranoid about cross-client leakage between
> > connections. Well, if the application can't be trusted not to leak
> > that kind of info, then it can't be trusted to make the connections
> > properly in the first place.
> >
> > -chris
> >
>
> Hard to say what their issue is. We've asked about implementing it before,
> but they don't support it. You know how software companies are. Maybe
> they had a technical problem with it years ago and have just not revisited it.
> They're stuck in a rut and there is too much inertia to get them out of it.
>
> --Eric
>
>
>
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
> José,
>
> > -Original Message-
> > From: José Cornado 
> > Sent: Thursday, December 30, 2021 12:00 PM
> > To: Tomcat Users List 
> > Subject: Re: Do I Need Network NameSpaces to Solve This
> > Tomcat+Connector/J Problem?
> >
> > But they do not get a corresponding database instance?
> >
>
> They do. Each tomcat instance has a corresponding database instance
> listening on its own dedicated port. Even so, we've seen cases where all the
> available client ports are exhausted.
>
> This raises the question, does the Linux ip_local_port_range shown here...
>
> $ cat /proc/sys/net/ipv4/ip_local_port_range
> 32768   61000
>
> ...apply globally, or on a per-socket basis? I would think that it should 
> apply
> per socket, but in practice it seems to be a global limitation.
>
> -Eric
>
>

I need to correct myself. My testing confirms that it is per-socket.

-Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
José,

> -Original Message-
> From: José Cornado 
> Sent: Thursday, December 30, 2021 12:00 PM
> To: Tomcat Users List 
> Subject: Re: Do I Need Network NameSpaces to Solve This
> Tomcat+Connector/J Problem?
>
> But they do not get a corresponding database instance?
>

They do. Each tomcat instance has a corresponding database instance listening 
on its own dedicated port. Even so, we've seen cases where all the available 
client ports are exhausted.

This raises the question, does the Linux ip_local_port_range shown here...

$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

...apply globally, or on a per-socket basis? I would think that it should apply 
per socket, but in practice it seems to be a global limitation.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Chris,

> Not pooling connections will very likely negatively affect performance.
>
> When you say "they ... have an issue with connection pooling" do you mean
> that they have a technical problem, or do you mean that there is some ill-
> conceived policy against them?
>
> Oh, maybe they are paranoid about cross-client leakage between
> connections. Well, if the application can't be trusted not to leak that kind 
> of
> info, then it can't be trusted to make the connections properly in the first
> place.
>
> -chris
>

Hard to say what their issue is. We've asked about implementing it before, but 
they don't support it. You know how software companies are. Maybe they had a 
technical problem with it years ago and have just not revisited it. They're 
stuck in a rut and there is too much inertia to get them out of it.

--Eric




Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Chris,

> Stupid question: can your database (meaningfully) handle the number of
> connections you are making to it? Let's say you have 5000 connections per
> Tomcat instance to your database, and you want 500 Tomcat instances.
> That means 250 database connections. If every single one of those is
> executing a query, will your database melt or are you okay?
>
> Are you pooling database connections? Are you sure you need thousands-at-
> a-time for each Tomcat instance?
>
> I'm not saying that you absolutely /do not/ need this kind of thing, but "most
> people" don't need that kind of concurrency.
>

A fair question. The load is spread over multiple database servers, and they 
are capable of handling the connection count. The problem we've encountered in 
the past was on the client side, where it would exhaust the available client 
ports. We alleviated that by increasing the client port range, but the problem 
will return when we quintuple the tomcat instance count.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Stefan,

> A third option could be to add something between database client and
> server. Something on layer 4 like multiple HAProxy servers or simple NAT
> gateways. Or more complex on layer 7 specfic products like ProxySQL or
> MaxScale. They could even pool connections and reduce the load on the
> database server. But this all adds complexity and new ways to fail.
>
> The easiest solution in terms of implementation and operation is the one
> Mark suggested: add multiple ip addresses and/or ports to your database
> listener.
>
> Regards,
>
>Stefan
>

My original idea was to add multiple source IPs to the app server. Mark's 
suggestion is similar, except we would change the destination IPs. Either way, 
it opens up the opportunity to have more unique sockets. Both suggestions would 
work. I'm just wondering if there is something I can do with network namespaces 
that would be even better. I don't have any experience with using it.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
José,

> Is this setup going to be open to the world or just a big organization? A big
> organization would put a cap on the number of users. Then maybe they
> could divide those between the tomcat instances thus the db server.
>

It's a SaaS solution, where each customer organization gets its own tomcat 
instance.

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Mark,

> > My question is, is there a better way?
>
> I can only think of variations on a theme.
>
> The ~64k limit assumes client IP, server IP and server port remain constant.
> i.e. just client port is varying.
>
> That suggests there is a single IP for the database server and that it is
> listening on a single port.
>
> You are currently varying client IP. Varying server IP is unlikely to be any
> different in terms of ease of management etc.
>
> There may be more mileage in getting the database server to listen on more
> than one port. It depends how the database sever is structured. If it can have
> multiple listeners all passing connections to the same database instance then
> adding db listeners might be a simpler way to manage this.
>
> Mark

In reality, there are multiple database servers. Even so, we have seen cases 
where the vendor software consumed huge numbers of TCP connections rapidly due 
to some function of the java code, and started throwing errors about not being 
able to open sockets. We alleviated the issue by increasing the number of 
available client ports, but there were less than 100 tomcat instances on the 
app server. When we increase the count to 500, the problem will reappear unless 
we can figure out a way to distribute client port usage. That's why I came up 
with the idea of using multiple source IPs on the app server. I am new to 
network namespaces, and thought that might be a better solution, but I have no 
experience with it.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-30 Thread Eric Robinson
Hi Simon,


> I guess the database is not on the Tomcat host, otherwise you could connect
> via unix domain socket to avoid the limitations of TCP port numbers.
>
> Otherwise I think you could run a db proxy where your Tomcat clients
> connect locally via unix domain socket and the proxy relays the queries to the
> db backend.
>
> Regards,
> Simon
>

Your guess is correct.

-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-29 Thread Eric Robinson
> Your problem seems to be in the client-to-db server side of things. Not
> tomcat as a server.
>

In the context of this question, tomcat is the client.

> On Wed, Dec 29, 2021 at 2:11 PM Eric Robinson 
> wrote:
>
> > We want to run a large number of tomcat instances on the same server
> > without virtualization or containerization. Each instance is executed
> > from its own folder tree and listens on its own unique TCP port. Each
> > instance will run code that connects to a backend database server to
> > send queries that are triggered by JSP calls from users. We’ve done
> > this successfully with up to 120 instances of tomcat running on the
> > same server while avoiding the overhead of virtualization and the
> complexity of containers.
> > Based on our experience over the past decade, we know that we could
> > potentially host 500 or more separate tomcat instances on the same
> > server without running into performance problems. So now we want to
> > make it 500 parallel instances.
> >
> >
> >
> > Here’s the problem. When tomcat initiates an outbound connection (for
> > example, with Connector/J to query a backend database) it establishes
> > a socket, and the socket has a client port. With thousands of users
> > making requests that require the tomcat services to query back end
> > databases, the OS can easily run out of available client ports to
> > allocate to sockets. To avoid that problem, we can assign multiple IPs
> > to the server and use the localSocketAddress property of Connector/J
> > to group tomcats such that only a subset of them each use the same
> > source IP. Then each group will have its own range of 64,000-ish client
> ports. I’ve tested this and it works.
> >
> >
> >
> > My question is, is there a better way?
> >
> >
> >
> >
> >
> > <113>
> >
> >
> > Disclaimer : This email and any files transmitted with it are
> > confidential and intended solely for intended recipients. If you are
> > not the named addressee you should not disseminate, distribute, copy or
> alter this email.
> > Any views or opinions presented in this email are solely those of the
> > author and might not represent those of Physician Select Management.
> > Warning: Although Physician Select Management has taken reasonable
> > precautions to ensure no viruses are present in this email, the
> > company cannot accept responsibility for any loss or damage arising
> > from the use of this email or attachments.
> >
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-29 Thread Eric Robinson
> -Original Message-
> From: Mark Eggers 
> Sent: Wednesday, December 29, 2021 6:18 PM
> To: users@tomcat.apache.org
> Subject: Re: Do I Need Network NameSpaces to Solve This
> Tomcat+Connector/J Problem?
>
> Eric:
>
> On 12/29/2021 1:04 PM, Eric Robinson wrote:
> > We want to run a large number of tomcat instances on the same server
> without virtualization or containerization. Each instance is executed from its
> own folder tree and listens on its own unique TCP port. Each instance will run
> code that connects to a backend database server to send queries that are
> triggered by JSP calls from users. We've done this successfully with up to 120
> instances of tomcat running on the same server while avoiding the overhead
> of virtualization and the complexity of containers. Based on our experience
> over the past decade, we know that we could potentially host 500 or more
> separate tomcat instances on the same server without running into
> performance problems. So now we want to make it 500 parallel instances.
> >
> >
> > Here's the problem. When tomcat initiates an outbound connection (for
> example, with Connector/J to query a backend database) it establishes a
> socket, and the socket has a client port. With thousands of users making
> requests that require the tomcat services to query back end databases, the
> OS can easily run out of available client ports to allocate to sockets. To 
> avoid
> that problem, we can assign multiple IPs to the server and use the
> localSocketAddress property of Connector/J to group tomcats such that only
> a subset of them each use the same source IP. Then each group will have its
> own range of 64,000-ish client ports. I've tested this and it works.
> >
> >
> >
> > My question is, is there a better way?
>
> Are you using database connection pooling? If you are, wouldn't the
> outbound connections to the database from a particular Tomcat be limited to
> the maxTotal in your context.xml (maxActive in Tomcat 7).
>
> So unless you're using a huge pool, wouldn't the required number of
> outbound ports be fairly small?
>
> . . . just my two cents
> /mde/

Unfortunately, we are under vendor constraints. They apparently have an issue 
with connection pooling.

-Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Do I Need Network NameSpaces to Solve This Tomcat+Connector/J Problem?

2021-12-29 Thread Eric Robinson
We want to run a large number of tomcat instances on the same server without 
virtualization or containerization. Each instance is executed from its own 
folder tree and listens on its own unique TCP port. Each instance will run code 
that connects to a backend database server to send queries that are triggered 
by JSP calls from users. We've done this successfully with up to 120 instances 
of tomcat running on the same server while avoiding the overhead of 
virtualization and the complexity of containers. Based on our experience over 
the past decade, we know that we could potentially host 500 or more separate 
tomcat instances on the same server without running into performance problems. 
So now we want to make it 500 parallel instances.


Here's the problem. When tomcat initiates an outbound connection (for example, 
with Connector/J to query a backend database) it establishes a socket, and the 
socket has a client port. With thousands of users making requests that require 
the tomcat services to query back end databases, the OS can easily run out of 
available client ports to allocate to sockets. To avoid that problem, we can 
assign multiple IPs to the server and use the localSocketAddress property of 
Connector/J to group tomcats such that only a subset of them each use the same 
source IP. Then each group will have its own range of 64,000-ish client ports. 
I've tested this and it works.



My question is, is there a better way?


[cid:image001.png@01D7FCC2.42FAB2E0]

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: 500 instances of tomcat on the same server

2021-06-29 Thread Eric Robinson
> -Original Message-
> From: Berneburg, Cris J. - US 
> Sent: Tuesday, June 29, 2021 7:16 AM
> To: users@tomcat.apache.org
> Subject: RE: 500 instances of tomcat on the same server
>
> Eric and Mark
>
> Just curious...
>
> Eric> We can run 75 to 125 instances of tomcat on a single Linux server
>
> Eric, Do you have or need a centralized way of managing all those instances?

Hi Cris and Mark,

We have about 1500 instances of tomcat (750 load-balanced virtual services 
across 20 physical servers). We currently manage the environment with scripts. 
Due to the simplicity and consistency of our internal deployment standards, it 
works well and is pretty easy to manage on an instance-by-instance basis, but a 
web console where we can see the environment as a whole, or filter on portions 
of it, would be amazing!

> It sounds like different support groups connect to their own instances, if I
> understand correctly.
>

Same software vendor, different technicians. When they call, they are seeking 
to support an individual customer. We give them an interface through which they 
see a sandboxed representation of that customer's deployment. The servers 
themselves are multi-tenanted, but when the support techs connect, it looks to 
them like a single instance. Each customer is running the same canned webapp, 
but possibly different versions of it, with different memory configurations, 
and requiring different versions of tomcat and the JVM.


> Mark> if there are changes we could make to Tomcat that would it easier
> Mark> to run and manage that many instances do let us know.
> Mark> We'd be happy to consider them.
>
> Mark, did you already have something in mind?  Like a TC Manager-manager?
> Some sort of dashboard that is able to perform TC Manager ops against all
> the instances?
>
> --
> Cris Berneburg
> CACI Senior Software Engineer
>
>
> 
>
> This electronic message contains information from CACI International Inc or
> subsidiary companies, which may be company sensitive, proprietary,
> privileged or otherwise protected from disclosure. The information is
> intended to be used solely by the recipient(s) named above. If you are not
> an intended recipient, be aware that any review, disclosure, copying,
> distribution or use of this transmission or its contents is prohibited. If you
> have received this transmission in error, please notify the sender
> immediately.
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 500 instances of tomcat on the same server

2021-06-28 Thread Eric Robinson
Guido,

I think you intended that message for me, not Brian. Thanks much for the 
feedback. I have been reading about Kubernetes, but I got discouraged when I 
saw that they dropped Docker support, since Docker seems to be the most popular 
containeriziation technology. Also, most of the Kubernetes tutorials I saw on 
YouTube seem to approach it as a dev platform, and we're not developers.

-Eric


> -Original Message-
> From: Guido Jäkel 
> Sent: Monday, June 28, 2021 2:43 PM
> To: Brian Wolfe 
> Cc: Tomcat Users List 
> Subject: Re: 500 instances of tomcat on the same server
>
> Dear Brian,
>
> please take the time to read about Linux Kernel namespaces as the technical
> base of containers. It's like two viewpoints to one thing. Take the network
> namespace as an example: From the conceptual point of view it looks like
> you have N indipended, functional identical "IP Stacks". But from the
> technical point of view, it's just the "well known" single instance just with 
> an
> additional field at all items that need this (packets, routing tables, ...) 
> to take
> a tag value that identify the namespace instance.
>
> You may use namespaces with the raw tools like enterns or with LXC or
> Dockers. During runtime of a started container, there's nothing more you
> have to trust but the kernel because for the basics, there's no need of
> additional userland processes to keep a container running.
>
> To run an application in a "container", you start it with a bunch of 
> instances of
> this namespaces, at least the process namespace. You'll probably take the
> same name for the technical namespace instances - from the conceptual
> point of view this is the name of the container.
>
> Most will start something like the init binary located in a directory tree of 
> a
> small Linux distribution userland. This may "boot" common services and the
> result may act like an "indipended platfrom". But you may also launch just
> single high-level applications like a JVM running a Tomcat.
>
> That's very close to your architecture, but much more easy to handle. For the
> network stack e.g. you may use the same ports for listeners and have the full
> range of ports available for connections in each namespace. There are
> different ways available to route the traffic, but in any case you may use
> individual IPs in each namespace.
>
> greetings
>
> Guido
>
> On 2021-06-28 19:22, Brian Wolfe wrote:
> > Generally, I'd agree too. We are considering using containers, but I'm
> > not yet sure what that buys us in terms of stability.
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: 500 instances of tomcat on the same server

2021-06-28 Thread Eric Robinson
> -Original Message-
> From: Brian Wolfe 
> Sent: Monday, June 28, 2021 12:23 PM
> To: Tomcat Users List 
> Subject: Re: 500 instances of tomcat on the same server
>
> I tend to agree with the initial assessment from Mark, your only issue would
> be on the OS level. # of file descriptors for connections. That many tomcat
> servers and your gonna start using a lot of ports and push the OS limits on 
> file
> read/write capabilities.
>

Those are some of my concerns as well, which is why I asked the question. I can 
work around the limitation on ephemeral client ports by adding additional IPs 
to the box and using the localSocketAddress property of Connector/J. I can set 
the max file descriptors to something like half a million. Are there other 
potential limitations you can think off?

> From an architecture perspective you should probably work on moving to a
> more modern deployment model of containerization of these apps. You
> would be better served by containerizing each customer deployment and
> running them on a kubernetes cluster. you can avoid the need for having
> large machines and scale more appropriately. and moving between hardware
> would be as simple as adding/removing nodes to your cluster.

It is 2 cents well spent. I have also considered Kubernetes and 
containerization, but I don't yet understand it well enough to know exactly how 
it benefits me.

>It sounds like
> the apps must be simple to be able to scale it to different clients like that.
> just my 2 cents.
>

Not simple, but predictable. We've been hosting it for over decade, and we have 
a good feel for its resource utilization.


-Eric

> On Mon, Jun 28, 2021 at 1:12 PM Eric Robinson 
> wrote:
>
> >
> >
> >
> >
> > > -Original Message-
> > > From: Mark Thomas 
> > > Sent: Monday, June 28, 2021 9:04 AM
> > > To: users@tomcat.apache.org
> > > Subject: Re: 500 instances of tomcat on the same server
> > >
> > > On 28/06/2021 14:53, Christopher Schultz wrote:
> > > > Eric,
> > > >
> > > > On 6/25/21 22:58, Eric Robinson wrote:
> > > >> We can run 75 to 125 instances of tomcat on a single Linux server
> > > >> with
> > > >> 12 cores and 128GB RAM. It works great. CPU is around 25%, our
> > > >> JVMs are not throwing OOMEs, iowait is minimal, and network
> > > >> traffic is about 30Mbps. We're happy with the results.
> > > >>
> > > >> Now we're upping the ante. We have a 48-core server with 1TB RAM,
> > > >> and we're planning to run 600+ tomcat instances on it simultaneously.
> > > >> What caveats or pitfalls should we watch out for? Are there any
> > > >> hard limits that would prevent this from working as expected?
> > > > If you have the resources, I see no reason why this would present
> > > > any problems.
> > > >
> > > > On the other hand, what happens when you need to upgrade the OS
> on
> > > > this beast? You are now talking about disturbing not 72-125
> > > > clients, but 600 of them.
> > > >
> > > > If I had a beast like this, I'd run VMWare (or similar) on it,
> > > > carve it up into virtual machines, and run fewer clients on
> > > > each just for the sheer flexibility of it.
> > > That just moves the goal posts. You'll have the same issue when the
> > > hypervisor needs updating (which admittedly may need a reboot less
> > > often than the OS).
> > >
> > > > If this is already a virtualized/cloud environment, then I think
> > > > you're doing it wrong: don't provision one huge instance and use
> > > > it for multiple clients. Instead, provision lots of small
> > > > instances and use them for fewer (or even 1) at a time.
> > >
> > > But it adds the overhead of an OS for each instance. And costs if
> > > you
> > have to
> > > pay for that OS instance.
> > >
> >
> > The overhead issue is an important factor. The other is the fact that
> > it's a canned app, supported by the publisher, and doing it our way
> > pays big dividends in terms of that workflow.
> >
> > > As always there are trade-offs to be made and the "right" answer
> > > will
> > vary
> > > based on circumstances and what you are trying to optimize for. I do
> > agree
> > > that, generally, more smaller instances will be a closer fit to more
> > > use
> > cases
> > > but that is only a general answer.
> > >
> >
> 

RE: 500 instances of tomcat on the same server

2021-06-28 Thread Eric Robinson





> -Original Message-
> From: Mark Thomas 
> Sent: Monday, June 28, 2021 9:04 AM
> To: users@tomcat.apache.org
> Subject: Re: 500 instances of tomcat on the same server
>
> On 28/06/2021 14:53, Christopher Schultz wrote:
> > Eric,
> >
> > On 6/25/21 22:58, Eric Robinson wrote:
> >> We can run 75 to 125 instances of tomcat on a single Linux server
> >> with
> >> 12 cores and 128GB RAM. It works great. CPU is around 25%, our JVMs
> >> are not throwing OOMEs, iowait is minimal, and network traffic is
> >> about 30Mbps. We're happy with the results.
> >>
> >> Now we're upping the ante. We have a 48-core server with 1TB RAM, and
> >> we're planning to run 600+ tomcat instances on it simultaneously.
> >> What caveats or pitfalls should we watch out for? Are there any hard
> >> limits that would prevent this from working as expected?
> > If you have the resources, I see no reason why this would present any
> > problems.
> >
> > On the other hand, what happens when you need to upgrade the OS on
> > this beast? You are now talking about disturbing not 72-125 clients,
> > but 600 of them.
> >
> > If I had a beast like this, I'd run VMWare (or similar) on it, carve
> > it up into virtual machines, and run fewer clients on each just
> > for the sheer flexibility of it.
> That just moves the goal posts. You'll have the same issue when the
> hypervisor needs updating (which admittedly may need a reboot less often
> than the OS).
>
> > If this is already a virtualized/cloud environment, then I think
> > you're doing it wrong: don't provision one huge instance and use it
> > for multiple clients. Instead, provision lots of small instances and
> > use them for fewer (or even 1) at a time.
>
> But it adds the overhead of an OS for each instance. And costs if you have to
> pay for that OS instance.
>

The overhead issue is an important factor. The other is the fact that it's a 
canned app, supported by the publisher, and doing it our way pays big dividends 
in terms of that workflow.

> As always there are trade-offs to be made and the "right" answer will vary
> based on circumstances and what you are trying to optimize for. I do agree
> that, generally, more smaller instances will be a closer fit to more use cases
> but that is only a general answer.
>

Generally, I'd agree too. We are considering using containers, but I'm not yet 
sure what that buys us in terms of stability.

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 500 instances of tomcat on the same server

2021-06-28 Thread Eric Robinson
> -Original Message-
> From: Christopher Schultz 
> Sent: Monday, June 28, 2021 8:54 AM
> To: users@tomcat.apache.org
> Subject: Re: 500 instances of tomcat on the same server
>
> Eric,
>
> On 6/25/21 22:58, Eric Robinson wrote:
> > We can run 75 to 125 instances of tomcat on a single Linux server with
> > 12 cores and 128GB RAM. It works great. CPU is around 25%, our JVMs
> > are not throwing OOMEs, iowait is minimal, and network traffic is
> > about 30Mbps. We're happy with the results.
> >
> > Now we're upping the ante. We have a 48-core server with 1TB RAM, and
> > we're planning to run 600+ tomcat instances on it simultaneously.
> > What caveats or pitfalls should we watch out for? Are there any hard
> > limits that would prevent this from working as expected?
> If you have the resources, I see no reason why this would present any
> problems.
>
> On the other hand, what happens when you need to upgrade the OS on this
> beast? You are now talking about disturbing not 72-125 clients, but 600 of
> them.
>

There are two load-balanced servers, each with adequate power to support the 
whole load. When we want to maintain Server A, we drain it at the load balancer 
and wait for the last active connection to complete. Then we reboot/maintain 
the server and add it back into the rotation gracefully.

> If I had a beast like this, I'd run VMWare (or similar) on it, carve it up 
> into
> virtual machines, and run fewer clients on each just for the sheer 
> flexibility
> of it.
>

We considered doing it that way. Performance is top priority, so we ultimately 
decided to run the instances on metal rather than introducing a few trillion 
lines of OS code into the mix.  We might consider containerizing.


> If this is already a virtualized/cloud environment, then I think you're doing 
> it
> wrong: don't provision one huge instance and use it for multiple clients.
> Instead, provision lots of small instances and use them for fewer (or even 1)
> at a time.
>

That makes sense until you know the environment better. It's a canned 
application and we're not the publisher. Breaking it out this way gives us the 
ability to present each customer and a unique entity to the publisher for 
support purposes. When their techs connect, the sandbox allows them to 
troubleshoot and support our mutual customer independently, which puts them in 
an environment their techs are comfortable with, and removed the risk of them 
doing something that impacts everybody on the server (or in the VM, if we used 
those).

All I can tell you is we've been running it this way for 15 years and we've 
never looked back and wished we were doing it differently.

> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 500 instances of tomcat on the same server

2021-06-26 Thread Eric Robinson


> -Original Message-
> From: Shawn Heisey 
> Sent: Saturday, June 26, 2021 8:09 PM
> To: users@tomcat.apache.org
> Subject: Re: 500 instances of tomcat on the same server
>
> On 6/25/2021 8:58 PM, Eric Robinson wrote:
> > We can run 75 to 125 instances of tomcat on a single Linux server with 12
> cores and 128GB RAM. It works great. CPU is around 25%, our JVMs are not
> throwing OOMEs, iowait is minimal, and network traffic is about 30Mbps.
> We're happy with the results.
> >
> > Now we're upping the ante. We have a 48-core server with 1TB RAM, and
> we're planning to run 600+ tomcat instances on it simultaneously. What
> caveats or pitfalls should we watch out for? Are there any hard limits that
> would prevent this from working as expected?
>
> I'm a lurker here.  I have some experience with Tomcat, but most of my
> experience is with other Apache projects.
>
> I'm hoping that my question mirrors what the experienced folks around here
> are thinking:
>
> For something like this ... why are you running so many multiple instances?
> Why not run one instance, or a few of them, and have each one handle many
> many webapps?  I bet you'll find that the overall memory requirements go
> way down, because there will be far fewer instances of Java running.
>
> Maybe you've got good reasons for the architecture you have chosen ...
> but it seems like a complete waste of resources to me.
>
> Thanks,
> Shawn

Hi Shawn --

We architected it that way many years ago and have always been happy with our 
early decisions. By running separate tomcats, each from a separate folder and 
listening on a unique port, we are able to perform maintenance on individual 
customer's instances, stop and start services, etc., without impacting other 
customers. We can customize tomcat settings and configurations to customer 
needs, fine-tune per-customer memory usage, run different versions of tomcat 
and JDK, create customer-specific security sandboxes, per-customer filesystem 
permissions, samba shares, etc., and move entire instances around the farm to 
distribute load as needed. Errors and problems are easier to track down because 
the tomcat and webapps logs are in separate folders unique to each instance. If 
Customer A is having an issue with their application, we know that everything 
associated with their application, including all settings and logs, are in a 
folder dedicated to them.

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



500 instances of tomcat on the same server

2021-06-25 Thread Eric Robinson
We can run 75 to 125 instances of tomcat on a single Linux server with 12 cores 
and 128GB RAM. It works great. CPU is around 25%, our JVMs are not throwing 
OOMEs, iowait is minimal, and network traffic is about 30Mbps. We're happy with 
the results.

Now we're upping the ante. We have a 48-core server with 1TB RAM, and we're 
planning to run 600+ tomcat instances on it simultaneously. What caveats or 
pitfalls should we watch out for? Are there any hard limits that would prevent 
this from working as expected?

-Eric










Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-25 Thread Eric Robinson
> -Original Message-
> From: Mark H. Wood 
> Sent: Friday, June 25, 2021 12:30 PM
> To: users@tomcat.apache.org
> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
>
> On Fri, Jun 25, 2021 at 12:46:03PM +, Eric Robinson wrote:
> > Olaf and Scott --
> >
> > Thanks to both of you for your comments. I may have asked my question
> poorly, since what you both described is the way I understand TCP to work.
> There is no correlation between an incoming connection to tomcat and its
> outgoing connection to a database backend, nor would I expect there to be.
> >
> > Perhaps a simpler way to ask my question is: when a server has multiple
> IPs, which one does tomcat use as its source IP when it initiates a three-way
> handshake with a remote machine?
> >
> > For example, suppose my server has IP addresses 10.0.0.1 and 10.0.0.2, and
> my tomcat connector looks like this...
> >
> >  > port="8080"
> > protocol="HTTP/1.1"
> > address="10.0.0.2"
> > connectionTimeout="2"
> > redirectPort="8443"
> >   />
> >
> > Tomcat is now listening on IP 10.0.0.2.
> >
> > But here's the question. If tomcat needs to initiate a TCP session to a
> remote machine (acting as a TCP client), will it use 10.0.0.1 or 10.0.0.2 as 
> the
> source IP of the outbound connection? I'm assuming it will use the same IP
> that the connector is configured to listen on.
>
> man 7 tcp
>
> A client uses 'connect' and doesn't need to set a local address.  Only a 
> service
> needs to declare its own address and port.
>
> The kernel routing database knows which distant hosts should be reachable
> via each local address.  'connect' should use this to pick an address that can
> reach the distant host, assign an unallocated port, and send SYN to request a
> connection.
>
> So the answer to your question is "it depends on the service host's address
> and what networks the interfaces for 10.0.0.1 and 10.0.0.2 can see."
>

Gotcha, that is clearer to me now. Fortunately, Christopher Schultz turned me 
on to the Connector/J localSocketAddress property, and now I can control which 
source IP my tomcat instances use when connecting to remote database servers.

> --
> Mark H. Wood
> Lead Technology Analyst
>
> University Library
> Indiana University - Purdue University Indianapolis
> 755 W. Michigan Street
> Indianapolis, IN 46202
> 317-274-0749
> www.ulib.iupui.edu
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-25 Thread Eric Robinson
> -Original Message-
> From: Christopher Schultz 
> Sent: Friday, June 25, 2021 11:33 AM
> To: users@tomcat.apache.org
> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
>
> Eric,
>
> On 6/24/21 21:14, Eric Robinson wrote:
> > I guess I may have answered this question for myself. At least I can
> > simulate it with ncat. Note that I have two ncat sessions open to the
> > same remote server using the same source port, but with different
> > source IPs.
> >
> > [root@testserver ~]# netstat -antp|grep ncat
> > tcp0  0 192.168.11.215:3456 192.168.10.59:9000  
> > ESTABLISHED
> 60946/ncat
> > tcp0  0 192.168.10.58:3456  192.168.10.59:9000  
> > ESTABLISHED
> 60920/ncat
>
>
> What is the command-line you used to invoke those nc processes?
> Presumably, you had to specifically tell each process which source interface
> to use.

Yes, as follows. I forced each ncat to use a specific source IP and source port.

 #  ncat -s 192.168.10.58 -p 3456 remoteserver 9000
 #  ncat -s 192.168.11.215 -p 3456 remoteserver 9000

>
> I haven't done this myself, but my guess would be that every outgoing
> connection would use the default network interface appropriate for that
> type of communication.
>
> The IP/interface Tomcat uses to bind and listen for connections has no
> bearing on which interface is chosen for outbound connections.
>

Gotcha. So I need to find a way to force all connections from a tomcat instance 
to remote IP "X" to always use source IP "Y." That's my challenge.


>  > Is there any reason why tomcat should not be expected to work the same
> > way? And when I say tomcat, I really mean libraries like the mysql  > odbc
> connector that tomcat uses.
>
> Oh, you're using Connector/J? Then you want this setting:
>
>   localSocketAddress
>
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-connp-props-
> connection-authentication.html
>
> -chris
>

Om my goodness. You, sir, are my ever-livin' hero. That WORKED. I tested it, 
and now my tomcat instances connect to the database server using whatever local 
IP I specify in the localSocketAddress property.

I am a happy man.

-Eric

> >> -Original Message-
> >> From: Eric Robinson 
> >> Sent: Thursday, June 24, 2021 3:19 PM
> >> To: Tomcat Users List 
> >> Subject: Re-Use TCP Source Ports if the Socket is Unique?
> >>
> >> Two quick questions.
> >>
> >> Question 1:
> >>
> >> When tomcat creates a TCP connection to a remote server (for example,
> >> a back-end database) tomcat is acting as the TCP client in that case.
> >> Does it use the IP it is listening on as the source IP for its outbound 
> >> client
> connection?
> >>
> >> For example, Server1 has three IPs: 10.0.0.1 (primary), and two
> >> additional IPs, 10.0.0.2 and 10.0.0.3. Tomcat is listening on
> >> 10.0.0.2. It receives a request that requires it to connect to a
> >> database server. When it creates a TCP connection the database server,
> which IP does it use as the source address?
> >>
> >> Question 2:
> >>
> >> Suppose you have two instances of tomcat on the same server. TomcatA
> >> is listening on 10.0.0.2 and TomcatB on 10.0.0.3. First, TomcatA
> >> establishes a connection to a remote server from its source IP 10.0.0.2,
> source port 3456.
> >> Can TomcatB, which is listening on a different IP, also establish a
> >> connection to the remote database server using the same source port
> >> 3456, given that the sockets is unique (different source IP)?
> >>
> >> -Eric
> >>
> >>
> >>
> >>
> >>
> >>
> >> Disclaimer : This email and any files transmitted with it are
> >> confidential and intended solely for intended recipients. If you are
> >> not the named addressee you should not disseminate, distribute, copy
> >> or alter this email. Any views or opinions presented in this email
> >> are solely those of the author and might not represent those of
> >> Physician Select Management. Warning: Although Physician Select
> >> Management has taken reasonable precautions to ensure no viruses are
> >> present in this email, the company cannot accept responsibility for any
> loss or damage arising from the use of this email or attachments.
> > Disclaimer : This email and any files transmitted with it are confidential 
> > and
> intended solely for intended recipients. If you are not the named addressee
> you should not di

RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-25 Thread Eric Robinson

> -Original Message-
> From: Olaf Kock 
> Sent: Friday, June 25, 2021 8:07 AM
> To: users@tomcat.apache.org
> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
>
>
> On 25.06.21 14:46, Eric Robinson wrote:
> > Olaf and Scott --
> >
> > Thanks to both of you for your comments. I may have asked my question
> poorly, since what you both described is the way I understand TCP to work.
> There is no correlation between an incoming connection to tomcat and its
> outgoing connection to a database backend, nor would I expect there to be.
> >
> > Perhaps a simpler way to ask my question is: when a server has multiple
> IPs, which one does tomcat use as its source IP when it initiates a three-way
> handshake with a remote machine?
> >
> > For example, suppose my server has IP addresses 10.0.0.1 and 10.0.0.2, and
> my tomcat connector looks like this...
> >
> >  > port="8080"
> > protocol="HTTP/1.1"
> > address="10.0.0.2"
> > connectionTimeout="2"
> > redirectPort="8443"
> >   />
> >
> > Tomcat is now listening on IP 10.0.0.2.
> >
> > But here's the question. If tomcat needs to initiate a TCP session to a
> remote machine (acting as a TCP client), will it use 10.0.0.1 or 10.0.0.2 as 
> the
> source IP of the outbound connection? I'm assuming it will use the same IP
> that the connector is configured to listen on.
> >
> Hi Eric,
>
> again: There's no correlation. Your question boils down to a context-free
> "which source IP does tomcat use for outgoing connections?". In fact, Tomcat
> doesn't use any. It just asks the runtime environment (ultimately I'd expect
> the OS) for a connection to a particular destination, then it uses that.
>
> How the connection is then established will depend on
>
> * available network adapters
> * best route to the target address
> * OS or network configuration
>
> It will /not/ depend on any of Tomcat's Connector-configurations
> whatsoever
>

Got it. Then, given a tomcat server with one NIC and two IP addresses, 10.0.0.2 
and 10.0.0.3, when tomcat connects to a server on the same subnet at 10.0.0.50, 
what logic does the OS use to select the source IP, all else being equal? 
Obviously neither IP has a routing advantage.

> Olaf
>
>
> >> -Original Message-
> >> From: Olaf Kock 
> >> Sent: Friday, June 25, 2021 3:01 AM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
> >>
> >>
> >> On 25.06.21 05:19, Eric Robinson wrote:
> >>> Thanks for the feedback, Daniel.
> >>>
> >>> I guess the answer depends on whether the socket libraries use the
> >>> tomcat
> >> listening port as the source IP. If you have three tomcat instances
> >> listening on three different IPs, each instance should be able to
> >> open a client connection using the same source port, as long as each
> >> tomcat uses its listening IP as the source IP of the socket.
> >>> That's the part I'm still not sure about.
> >> My expectation is that database connections do not have any
> >> correlation with the listening port: Technically, DB connection pools
> >> can be shared across all contained Hosts and Connectors /within a
> >> single tomcat/, and when multiple processes are added to the game, it
> doesn't really change anything.
> >>
> >> In fact, it's not uncommon that there's a public facing network
> >> adapter, where a http-connector listens, but a completely different
> >> network adapter for any backend communication - e.g. to the database.
> >> All that I expect a database driver to do is to specify where it
> >> wants to connect to, and the OS figures out how that connection needs to
> be routed.
> >> That's utterly independent of any http connection that comes in to
> >> the same process.
> >>
> >> So: Don't expect any correlation, and you're safe.
> >>
> >> (Note: There /may/ be ways to configure a db-driver to specify a
> >> source address, but I'd expect that rather to add a potential failure
> >> rather than anything that I'd want to control. If you interpret such a
> situation differently:
> >> Please elaborate)
> >>
> >> Best,
> >>
> >> Olaf
> >>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: users-

RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-25 Thread Eric Robinson
Olaf and Scott --

Thanks to both of you for your comments. I may have asked my question poorly, 
since what you both described is the way I understand TCP to work. There is no 
correlation between an incoming connection to tomcat and its outgoing 
connection to a database backend, nor would I expect there to be.

Perhaps a simpler way to ask my question is: when a server has multiple IPs, 
which one does tomcat use as its source IP when it initiates a three-way 
handshake with a remote machine?

For example, suppose my server has IP addresses 10.0.0.1 and 10.0.0.2, and my 
tomcat connector looks like this...



Tomcat is now listening on IP 10.0.0.2.

But here's the question. If tomcat needs to initiate a TCP session to a remote 
machine (acting as a TCP client), will it use 10.0.0.1 or 10.0.0.2 as the 
source IP of the outbound connection? I'm assuming it will use the same IP that 
the connector is configured to listen on.


> -Original Message-
> From: Olaf Kock 
> Sent: Friday, June 25, 2021 3:01 AM
> To: users@tomcat.apache.org
> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
>
>
> On 25.06.21 05:19, Eric Robinson wrote:
> > Thanks for the feedback, Daniel.
> >
> > I guess the answer depends on whether the socket libraries use the tomcat
> listening port as the source IP. If you have three tomcat instances listening 
> on
> three different IPs, each instance should be able to open a client connection
> using the same source port, as long as each tomcat uses its listening IP as 
> the
> source IP of the socket.
> >
> > That's the part I'm still not sure about.
>
> My expectation is that database connections do not have any correlation
> with the listening port: Technically, DB connection pools can be shared across
> all contained Hosts and Connectors /within a single tomcat/, and when
> multiple processes are added to the game, it doesn't really change anything.
>
> In fact, it's not uncommon that there's a public facing network adapter,
> where a http-connector listens, but a completely different network adapter
> for any backend communication - e.g. to the database. All that I expect a
> database driver to do is to specify where it wants to connect to, and the OS
> figures out how that connection needs to be routed.
> That's utterly independent of any http connection that comes in to the same
> process.
>
> So: Don't expect any correlation, and you're safe.
>
> (Note: There /may/ be ways to configure a db-driver to specify a source
> address, but I'd expect that rather to add a potential failure rather than
> anything that I'd want to control. If you interpret such a situation 
> differently:
> Please elaborate)
>
> Best,
>
> Olaf
>
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-24 Thread Eric Robinson
Thanks for the feedback, Daniel.

I guess the answer depends on whether the socket libraries use the tomcat 
listening port as the source IP. If you have three tomcat instances listening 
on three different IPs, each instance should be able to open a client 
connection using the same source port, as long as each tomcat uses its 
listening IP as the source IP of the socket.

That's the part I'm still not sure about.

> -Original Message-
> From: Daniel Baktiar 
> Sent: Thursday, June 24, 2021 9:16 PM
> To: Tomcat Users List 
> Subject: Re: Re-Use TCP Source Ports if the Socket is Unique?
>
> Hi Eric,
>
> It should behave the same way. The socket client application will be assigned
> an ephemeral port.
>
> On Fri, Jun 25, 2021 at 9:14 AM Eric Robinson 
> wrote:
>
> > I guess I may have answered this question for myself. At least I can
> > simulate it with ncat. Note that I have two ncat sessions open to the
> > same remote server using the same source port, but with different source
> IPs.
> >
> > [root@testserver ~]# netstat -antp|grep ncat
> > tcp0  0 192.168.11.215:3456 192.168.10.59:9000
> > ESTABLISHED 60946/ncat
> > tcp0  0 192.168.10.58:3456  192.168.10.59:9000
> > ESTABLISHED 60920/ncat
> >
> > Is there any reason why tomcat should not be expected to work the same
> > way? And when I say tomcat, I really mean libraries like the mysql
> > odbc connector that tomcat uses.
> >
> >
> > > -Original Message-
> > > From: Eric Robinson 
> > > Sent: Thursday, June 24, 2021 3:19 PM
> > > To: Tomcat Users List 
> > > Subject: Re-Use TCP Source Ports if the Socket is Unique?
> > >
> > > Two quick questions.
> > >
> > > Question 1:
> > >
> > > When tomcat creates a TCP connection to a remote server (for
> > > example, a back-end database) tomcat is acting as the TCP client in
> > > that case. Does
> > it use
> > > the IP it is listening on as the source IP for its outbound client
> > connection?
> > >
> > > For example, Server1 has three IPs: 10.0.0.1 (primary), and two
> > additional
> > > IPs, 10.0.0.2 and 10.0.0.3. Tomcat is listening on 10.0.0.2. It
> > > receives
> > a request
> > > that requires it to connect to a database server. When it creates a
> > > TCP connection the database server, which IP does it use as the
> > > source
> > address?
> > >
> > > Question 2:
> > >
> > > Suppose you have two instances of tomcat on the same server. TomcatA
> > > is listening on 10.0.0.2 and TomcatB on 10.0.0.3. First, TomcatA
> > establishes a
> > > connection to a remote server from its source IP 10.0.0.2, source
> > > port
> > 3456.
> > > Can TomcatB, which is listening on a different IP, also establish a
> > connection
> > > to the remote database server using the same source port 3456, given
> > > that the sockets is unique (different source IP)?
> > >
> > > -Eric
> > >
> > >
> > >
> > >
> > >
> > >
> > > Disclaimer : This email and any files transmitted with it are
> > confidential and
> > > intended solely for intended recipients. If you are not the named
> > addressee
> > > you should not disseminate, distribute, copy or alter this email.
> > > Any
> > views or
> > > opinions presented in this email are solely those of the author and
> > might not
> > > represent those of Physician Select Management. Warning: Although
> > > Physician Select Management has taken reasonable precautions to
> > > ensure no viruses are present in this email, the company cannot
> > > accept
> > responsibility
> > > for any loss or damage arising from the use of this email or attachments.
> > Disclaimer : This email and any files transmitted with it are
> > confidential and intended solely for intended recipients. If you are
> > not the named addressee you should not disseminate, distribute, copy or
> alter this email.
> > Any views or opinions presented in this email are solely those of the
> > author and might not represent those of Physician Select Management.
> > Warning: Although Physician Select Management has taken reasonable
> > precautions to ensure no viruses are present in this email, the
> > company cannot accept responsibility for any loss or damage arising
> > from the use of this email or attachments.
> >
> > --

RE: Re-Use TCP Source Ports if the Socket is Unique?

2021-06-24 Thread Eric Robinson
I guess I may have answered this question for myself. At least I can simulate 
it with ncat. Note that I have two ncat sessions open to the same remote server 
using the same source port, but with different source IPs.

[root@testserver ~]# netstat -antp|grep ncat
tcp0  0 192.168.11.215:3456 192.168.10.59:9000  ESTABLISHED 
60946/ncat
tcp0  0 192.168.10.58:3456  192.168.10.59:9000  ESTABLISHED 
60920/ncat

Is there any reason why tomcat should not be expected to work the same way? And 
when I say tomcat, I really mean libraries like the mysql odbc connector that 
tomcat uses.


> -Original Message-
> From: Eric Robinson 
> Sent: Thursday, June 24, 2021 3:19 PM
> To: Tomcat Users List 
> Subject: Re-Use TCP Source Ports if the Socket is Unique?
>
> Two quick questions.
>
> Question 1:
>
> When tomcat creates a TCP connection to a remote server (for example, a
> back-end database) tomcat is acting as the TCP client in that case. Does it 
> use
> the IP it is listening on as the source IP for its outbound client connection?
>
> For example, Server1 has three IPs: 10.0.0.1 (primary), and two additional
> IPs, 10.0.0.2 and 10.0.0.3. Tomcat is listening on 10.0.0.2. It receives a 
> request
> that requires it to connect to a database server. When it creates a TCP
> connection the database server, which IP does it use as the source address?
>
> Question 2:
>
> Suppose you have two instances of tomcat on the same server. TomcatA is
> listening on 10.0.0.2 and TomcatB on 10.0.0.3. First, TomcatA establishes a
> connection to a remote server from its source IP 10.0.0.2, source port 3456.
> Can TomcatB, which is listening on a different IP, also establish a connection
> to the remote database server using the same source port 3456, given that
> the sockets is unique (different source IP)?
>
> -Eric
>
>
>
>
>
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee
> you should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although
> Physician Select Management has taken reasonable precautions to ensure
> no viruses are present in this email, the company cannot accept responsibility
> for any loss or damage arising from the use of this email or attachments.
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re-Use TCP Source Ports if the Socket is Unique?

2021-06-24 Thread Eric Robinson
Two quick questions.

Question 1:

When tomcat creates a TCP connection to a remote server (for example, a 
back-end database) tomcat is acting as the TCP client in that case. Does it use 
the IP it is listening on as the source IP for its outbound client connection?

For example, Server1 has three IPs: 10.0.0.1 (primary), and two additional IPs, 
10.0.0.2 and 10.0.0.3. Tomcat is listening on 10.0.0.2. It receives a request 
that requires it to connect to a database server. When it creates a TCP 
connection the database server, which IP does it use as the source address?

Question 2:

Suppose you have two instances of tomcat on the same server. TomcatA is 
listening on 10.0.0.2 and TomcatB on 10.0.0.3. First, TomcatA establishes a 
connection to a remote server from its source IP 10.0.0.2, source port 3456. 
Can TomcatB, which is listening on a different IP, also establish a connection 
to the remote database server using the same source port 3456, given that the 
sockets is unique (different source IP)?

-Eric






Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Wait... NULL address in java.net.BindException: Address already in use (Bind failed) ???

2021-03-17 Thread Eric Robinson




> -Original Message-
> From: Christopher Schultz 
> Sent: Wednesday, March 17, 2021 3:13 PM
> To: users@tomcat.apache.org
> Subject: Re: Wait... NULL address in java.net.BindException: Address already
> in use (Bind failed)  ???
>
> Eric and Martin,
>
> On 3/17/21 15:35, Martin Grigorov wrote:
> > On Wed, Mar 17, 2021, 20:27 Eric Robinson 
> wrote:
> >
> >>> From: Martin Grigorov 
> >>> Sent: Wednesday, March 17, 2021 12:45 PM
> >>> To: Tomcat Users List 
> >>> Subject: Re: Wait... NULL address in java.net.BindException: Address
> >> already
> >>> in use (Bind failed)  ???
> >>>
> >>> Hi,
> >>>
> >>> On Wed, Mar 17, 2021, 19:34 Eric Robinson 
> >>> wrote:
> >>>
> >>>> Getting error:
> >>>>
> >>>> java.net.BindException: Address already in use (Bind failed)
> >>>> :3787
> >>>>
> >>>
> >>> Please paste more lines of the exception.
> >>> Also please tell us which version of JDK/JRE you use.
> >>> This exception is very cryptic and does not usually tell which
> >>> address
> >> is in use.
> >>> I.e. 3787 is not the port, as you might think. Most probably it is a
> >> line in some
> >>> class.
> >>>
> >>
> >> Tomcat: Apache Tomcat/8.5.51
> >> JVM: 1.8.0_241-b08
> >>
> >> The following error appears in catalina.out under tomcat 8. It does
> >> not mention the null. We tried it under tomcat 7 as well, and that is
> >> where it mentions the null.
> >>
> >> 17-Mar-2021 11:12:54.039 INFO [main]
> >> org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
> >> ["http-nio-3787"]
> >> 17-Mar-2021 11:12:54.048 SEVERE [main]
> >> org.apache.catalina.core.StandardService.initInternal Failed to
> >> initialize connector [Connector[HTTP/1.1-3787]]
> >>
> >
> > This line says that 3787 is the port indeed.
> > Are you sure it is not bound?
>
> Also, please post your s from conf/server.xml.
>
> You mentioned ":3787" in your error message but I don't see that in
> the exception. Are you sure you are posting everything?
>

Hi Chris --

I mentioned in the email that the null reference appears in the catalina log 
when we use tomcat 7. It does not appear when we use tomcat 8. Although it 
fails to bind either way.

Here's the connector.



> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Wait... NULL address in java.net.BindException: Address already in use (Bind failed) ???

2021-03-17 Thread Eric Robinson
> -Original Message-
> From: Martin Grigorov 
> Sent: Wednesday, March 17, 2021 2:35 PM
> To: Tomcat Users List 
> Subject: Re: Wait... NULL address in java.net.BindException: Address already
> in use (Bind failed)  ???
>
> On Wed, Mar 17, 2021, 20:27 Eric Robinson 
> wrote:
>
> > > From: Martin Grigorov 
> > > Sent: Wednesday, March 17, 2021 12:45 PM
> > > To: Tomcat Users List 
> > > Subject: Re: Wait... NULL address in java.net.BindException: Address
> > already
> > > in use (Bind failed)  ???
> > >
> > > Hi,
> > >
> > > On Wed, Mar 17, 2021, 19:34 Eric Robinson 
> > > wrote:
> > >
> > > > Getting error:
> > > >
> > > > java.net.BindException: Address already in use (Bind failed)
> > > > :3787
> > > >
> > >
> > > Please paste more lines of the exception.
> > > Also please tell us which version of JDK/JRE you use.
> > > This exception is very cryptic and does not usually tell which
> > > address
> > is in use.
> > > I.e. 3787 is not the port, as you might think. Most probably it is a
> > line in some
> > > class.
> > >
> >
> > Tomcat: Apache Tomcat/8.5.51
> > JVM: 1.8.0_241-b08
> >
> > The following error appears in catalina.out under tomcat 8. It does
> > not mention the null. We tried it under tomcat 7 as well, and that is
> > where it mentions the null.
> >
> > 17-Mar-2021 11:12:54.039 INFO [main]
> > org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler
> > ["http-nio-3787"]
> > 17-Mar-2021 11:12:54.048 SEVERE [main]
> > org.apache.catalina.core.StandardService.initInternal Failed to
> > initialize connector [Connector[HTTP/1.1-3787]]
> >
>
> This line says that 3787 is the port indeed.
> Are you sure it is not bound?
>

100% sure, unless there's an invisible process using it. netstat and fuser both 
show nothing.
>
> org.apache.catalina.LifecycleException: Protocol handler
> > initialization failed
> > at
> > org.apache.catalina.connector.Connector.initInternal(Connector.java:1032)
> > at
> > org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
> > at
> >
> org.apache.catalina.core.StandardService.initInternal(StandardService.java:5
> 52)
> > at
> > org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
> > at
> >
> org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:84
> 8)
> > at
> > org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
> > at
> > org.apache.catalina.startup.Catalina.load(Catalina.java:639)
> > at
> > org.apache.catalina.startup.Catalina.load(Catalina.java:662)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> ava:62)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> > org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:303)
> > at
> > org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473)
> > Caused by: java.net.BindException: Address already in use
> > at sun.nio.ch.Net.bind0(Native Method)
> > at sun.nio.ch.Net.bind(Net.java:433)
> > at sun.nio.ch.Net.bind(Net.java:425)
> > at sun.nio.ch
> > .ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
> > at sun.nio.ch
> > .ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
> > at org.apache.tomcat.util.net
> > .NioEndpoint.bind(NioEndpoint.java:221)
> > at org.apache.tomcat.util.net
> > .AbstractEndpoint.init(AbstractEndpoint.java:1118)
> > at org.apache.tomcat.util.net
> > .AbstractJsseEndpoint.init(AbstractJsseEndpoint.java:223)
> > at
> > org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:587)
> > at
> >
> org.apache.coyote.http11.AbstractHttp11Protocol.init(AbstractHttp11Protoc
> ol.java:74)
> > at
> > org.apache.catalina.connector.Connector.initInternal(Connector.

RE: Wait... NULL address in java.net.BindException: Address already in use (Bind failed) ???

2021-03-17 Thread Eric Robinson
> From: Martin Grigorov 
> Sent: Wednesday, March 17, 2021 12:45 PM
> To: Tomcat Users List 
> Subject: Re: Wait... NULL address in java.net.BindException: Address already
> in use (Bind failed)  ???
>
> Hi,
>
> On Wed, Mar 17, 2021, 19:34 Eric Robinson 
> wrote:
>
> > Getting error:
> >
> > java.net.BindException: Address already in use (Bind failed)
> > :3787
> >
>
> Please paste more lines of the exception.
> Also please tell us which version of JDK/JRE you use.
> This exception is very cryptic and does not usually tell which address is in 
> use.
> I.e. 3787 is not the port, as you might think. Most probably it is a line in 
> some
> class.
>

Tomcat: Apache Tomcat/8.5.51
JVM: 1.8.0_241-b08

The following error appears in catalina.out under tomcat 8. It does not mention 
the null. We tried it under tomcat 7 as well, and that is where it mentions the 
null.

17-Mar-2021 11:12:54.039 INFO [main] org.apache.coyote.AbstractProtocol.init 
Initializing ProtocolHandler ["http-nio-3787"]
17-Mar-2021 11:12:54.048 SEVERE [main] 
org.apache.catalina.core.StandardService.initInternal Failed to initialize 
connector [Connector[HTTP/1.1-3787]]
org.apache.catalina.LifecycleException: Protocol handler initialization 
failed
at 
org.apache.catalina.connector.Connector.initInternal(Connector.java:1032)
at 
org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
at 
org.apache.catalina.core.StandardService.initInternal(StandardService.java:552)
at 
org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
at 
org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:848)
at 
org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:136)
at org.apache.catalina.startup.Catalina.load(Catalina.java:639)
at org.apache.catalina.startup.Catalina.load(Catalina.java:662)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:303)
at 
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
at 
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at 
org.apache.tomcat.util.net.NioEndpoint.bind(NioEndpoint.java:221)
at 
org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:1118)
at 
org.apache.tomcat.util.net.AbstractJsseEndpoint.init(AbstractJsseEndpoint.java:223)
at 
org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:587)
at 
org.apache.coyote.http11.AbstractHttp11Protocol.init(AbstractHttp11Protocol.java:74)
at 
org.apache.catalina.connector.Connector.initInternal(Connector.java:1030)
... 13 more


>
> > I know how to fix the infamous "Address already in use (Bind failed)"
> > problem when there is another process already listening on a port.
> > However, I have confirmed with netstat and fuser that there is no
> > other process listening on that port. Could the problem be that the
> > host address is null for some reason? I don't recall seeing that
> > before, and Google diving came up dry.
> >
> > -Eric
> >
> >
> >
> > Disclaimer : This email and any files transmitted with it are
> > confidential and intended solely for intended recipients. If you are
> > not the named addressee you should not disseminate, distribute, copy or
> alter this email.
> > Any views or opinions presented in this email are solely those of the
> > author and might not represent those of Physician Select Management.
> > Warning: Although Physician Select Management has taken reasonable
> > precautions to ensure no viruses are present in this email, the
> > company cannot accept responsibility for any loss or damage arising
> > from the use of this email or attachments.
> >
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy

Wait... NULL address in java.net.BindException: Address already in use (Bind failed) ???

2021-03-17 Thread Eric Robinson
Getting error:

java.net.BindException: Address already in use (Bind failed) :3787

I know how to fix the infamous "Address already in use (Bind failed)" problem 
when there is another process already listening on a port. However, I have 
confirmed with netstat and fuser that there is no other process listening on 
that port. Could the problem be that the host address is null for some reason? 
I don't recall seeing that before, and Google diving came up dry.

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-12-04 Thread Eric Robinson
> -Original Message-
> From: Christopher Schultz 
> Sent: Wednesday, December 2, 2020 10:21 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Mark,
>
> On 11/26/20 05:14, Mark Thomas wrote:
> > On 26/11/2020 04:57, Christopher Schultz wrote:
> >
> > 
> >
> >>> After a normal clean-up the parent then calls close on the two file
> >>> descriptors associated with the pipe for a second time."
> >>
> >> So the child cleans them up AND the parent cleans them up? Or the
> >> parent cleans when up twice? The child should be able to call close()
> >> as many times as it wants and only poison itself. Does the child
> >> process ever exit()?
> >
> > With the caveat that some of the below is educated guess work because
> > the strace was configured to look at the events we were interested in
> > so I am having to fill in some of the gaps.
> >
> > The parent "process" is a Java thread currently in native code in a
> > 3rd party library.
> >
> > The parent creates a pipe which comes with two file descriptors. One
> > for the read end, one for the write end.
> >
> > The parent process then forks. The child process now has copies of the
> > two file descriptors. (see man fork for more details).
> >
> > The parent closes its fd for the write end of the pipe. The child
> > closes its fd for the read end of the pipe.
> >
> > The child writes to the pipe and the parent reads from it.
> >
> > The child exits and closes its fd for the write end of the pipe.
> >
> > The parent closes its fd for the read end of the pipe.
> >
> > At this point all is good. All the closes completely cleanly.
> > Everything has been shutdown properly.
>
> +1
>
> > The two fds allocated to the parent are back in the pool any may be
> > reused by other threads in the JVM.
> >
> > The parent then attempts to close the fds associated with the pipe
> > again. For each fd, if it has not been reallocated an EBADF error
> > occurs. If it has been reallocated, it is closed thereby breaking
> > whatever was using it legitimately.
>
> Thanks for clarifying this. I was confused and thinking you were saying that
> the child process was the one breaking things, but it's the parent process.
> Since the parent is the JVM (the long-running process), all hell breaks loose.
>
> >> The parent process must be the JVM process, right? And the parent
> >> process (this native library, running within the JVM process)
> >> double-closes file descriptors, with some measurable delay?
> >
> > Correct. In the instance where I did most of the log analysis the
> > delay was about 0.1 seconds. In other logs I did observe longer delays
> > with what looked like a very different failure mode.
> >
> >> That's the
> >> only way this could make sense. And of course it mess mess everything
> >> up in really *really* unpredictable ways.
> >
> > Yep.
>
> Fascinating.
>
> Thanks for the wild ride, Eric and Mark :)
>
> -chris
>

I case anyone thought I had forgotten about all this... sorry, no such luck. 
You’re stuck with me!

Things have been quieter for the past several days because, at Mark's 
suggestion, we changed the nginx proxy and tomcat configurations to make them 
more tolerant of the underlying condition. Specifically, we configured nginx to 
use HTTP/1.1 instead of 1.0, enabled keepalives, and set 
maxKeepAliveRequests="-1"  in server.xml. This reduced the frequency of the 
issue.

The vendor was unable to dispute the quality of the analysis, so they accepted 
that the third-party component (which I can now name: PDFTron) could be the 
root cause. They disabled use of the component, and that seems to have quieting 
things down a bit more. We are still seeing occasional session disconnects, so 
it is possible that the component is leveraged for more than one function in 
their software and it was only disabled for a subset of them. The big 
difference now is that, instead of seeing a GET request from the proxy followed 
by a FIN from the upstream, now it’s a GET followed by an RST.

We'll begin the packet captures and straces again on Monday. Mark, besides 
network and fd tracing, is there anything else you want from strace to make the 
analysis more certain?

-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: 

RE: Weirdest Tomcat Behavior Ever?

2020-11-25 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, November 24, 2020 8:57 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 24/11/2020 14:11, Christopher Schultz wrote:
> > On 11/20/20 11:08, Mark Thomas wrote:
>
> 
>
> >> A second look at the strace file from yesterday provided hard
> >> evidence of a native library mis-using file descriptors and strong
> >> circumstantial evidence linking that same library to a specific
> >> instance of the observed failure.
> >
> > Interesting. How can you tell it's a library instead of, for example,
> > the JVM itself (which is of course itself a set of native libraries).
>
> strace shows the .so file being accessed, then we see the process forking, a
> pipe set up between the parent and child and the child calling execve on the
> same .so file. After a normal clean-up the parent then calls close on the two
> file descriptors associated with the pipe for a second time.
>
> I'm as sure as I can be without access to the source code for the .so file 
> that it
> is mis-handling file descriptors.
>
> > I'm assuming that when you say "native library" you mean "some native
> > component which is not a part of the core JVM".
>
> The .so file in question is not part of the JVM. It appears to be a 
> third-party
> native library that ships as part of the commercial web application where the
> original issue is observed.
>
> >> TL;DR, an issue in an external library, not a Tomcat issue.
>
> 
>
> > So does this really look like a (pretty serious) bug in a native
> > library? Any idea which one?
>
> I'm reasonably sure but I had to make a couple of assumptions based on file
> paths to ID the library. I've passed that info to Eric but until it is 
> confirmed it
> doesn't seem right to name it on list.
>
> Mark
>

The full evidence package was submitted to the application vendor this morning, 
including all relevant logs, packet captures, strace files, and the 
accompanying analysis (courtesy of Mark) which seems pretty conclusive. We're 
eager to hear their feedback. That said, I'm not too shy about mentioning the 
names of the suspected libraries, as long as we're clear that the cause is not 
confirmed, and specific about the evidence that points in that direction. I may 
do so in a follow-up posting. After all, we already know the names of the other 
major components involved--tomcat, java, and the jdbc connector. Naming the 
suspect libraries may do someone else a service who has a gut feeling about 
them but hasn't seen solid evidence to support their concerns.

-Eric

> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Weirdest Tomcat Behavior Ever?

2020-11-25 Thread Eric Robinson
> -Original Message-
> From: Christopher Schultz 
> Sent: Tuesday, November 24, 2020 8:11 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Mark,
>
> On 11/20/20 11:08, Mark Thomas wrote:
> > On 20/11/2020 15:43, Eric Robinson wrote:
> >>> From: Mark Thomas 
> >
> >
> > 
> >
> >>> This information might turn out to be unnecessary. I have been
> >>> re-looking at the previous logs and I think I have enough evidence
> >>> to ID a root cause. I'll email off-list as I need to quote logs to prove 
> >>> my
> point.
> >>>
> >>
> >> I'll be watching my inbox like a hawk!
> >
> > Sent.
> >
> > For the curious, the analysis indicated either JVM code or a native
> > library was incorrectly closing the file descriptor associated with
> > the socket being used by an in progress HTTP request.
>
> FWIW, Connector/J is a Type 4 JDBC Driver (aka "pure Java"), so no native
> components.
>
> I'm not sure how it would be able to close the connection.
>
> Also, v5.0.8 is like 13 years old. Eric, you guys *really* have to upgrade 
> that.
> Somewhat surprisingly, there are CVEs against that library which allow
> unauthenticated remote attackers to take-over the MySQL client
> connections opened by that library.
>

Chris, I'm in full agreement with you on that. We'd love to update the 
connector but we are under vendor constraints. They only support certain 
versions.

> > A second look at the strace file from yesterday provided hard evidence
> > of a native library mis-using file descriptors and strong
> > circumstantial evidence linking that same library to a specific
> > instance of the observed failure.
>
> Interesting. How can you tell it's a library instead of, for example, the JVM
> itself (which is of course itself a set of native libraries).
> I'm assiming that when you say "native library" you mean "some native
> component which is not a part of the core JVM".
>
> > TL;DR, an issue in an external library, not a Tomcat issue.
>
> I've recently been trying to optimize IO throughput of certain web-to-
> database (and vice-versa) operations in my own web applications.
> The gist is that we are wrapping request InputStreams or JDBC InputStreams
> (depending upon whether we are reading from request->db or
> db->response) to allow streaming directly in or out of the database with
> a minimum of buffering.
>
> If such a scheme were to be implemented particularly badly (e.g.
> allowing the database code direct-access to the request and response
> objects instead of just the stream(s)), it would certainly be possible
> for such an implementation to close the response's output stream before
> the servlet was expecting it.
>
> But such an implementation would be caught red-handed if a simple
> wrapper around the response stream were to be installed and simply log
> all the calls to close(), which I think was one of your first debugging
> steps, here.
>
> So does this really look like a (pretty serious) bug in a native
> library? Any idea which one?
>
> -chris
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-11-20 Thread Eric Robinson

> -Original Message-
> From: Mark Thomas 
> Sent: Friday, November 20, 2020 9:32 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 20/11/2020 14:55, Eric Robinson wrote:
> >> From: Mark Thomas 
> >> It looks like you are using MySQL from the data in the strace log. Is
> >> that correct? What I really need to know is which version of the
> >> MySQL JDBC driver (Connector/J) are you using? I'm trying to figure
> >> out where the root cause might be (or if it is a known issue fixed in a 
> >> later
> release).
> >>
> >
> > Yes, MySQL 5.6.41.
> >
> > Now sure how to tell the JDBC version. I unzipped the tomcat-jdbc.jar file
> and inspected the MANIFEST file, which shows the following. Is that what
> you need?
>
> That is one of the database connection pools provided by Tomcat. There
> should be a JAR somewhere called myswl-connector-java-x.y.x.jar or
> something close to that. It will either be in Tomcat's lib directory or under
> WEB-INF/lib in the webapp.
>

mysql-connector-java-commercial-5.0.8-bin.jar

> This information might turn out to be unnecessary. I have been re-looking at
> the previous logs and I think I have enough evidence to ID a root cause. I'll
> email off-list as I need to quote logs to prove my point.
>

I'll be watching my inbox like a hawk!

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-11-20 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Friday, November 20, 2020 3:17 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 19/11/2020 16:03, Mark Thomas wrote:
> > On 19/11/2020 15:55, Eric Robinson wrote:
>
> 
>
> >> Unfortunately, the tomcats get restarted every night, so I'll have to watch
> closely for the error to happen today and I will do a thread dump as close as
> possible to the time of the error.
> >
> > Sounds good. Tx.
> >
> > My guess based on what I saw in the strace is that the thread will be
> > something database related. Which database and which database driver
> > (version number etc) is in use? In particular, are there any native
> > components to the JDBC driver?
>
> Morning Eric,
>
> I got the latest logs. Unfortunately, it looks like the file descriptor 
> information
> wasn't included in the strace output. I'm not seeing any calls to close a file
> descriptor.
>

My fault again. I'll make sure it gets in there next time.

> The thread dump looks good. I can map information from strace to the
> thread dump successfully.
>
> It looks like you are using MySQL from the data in the strace log. Is that
> correct? What I really need to know is which version of the MySQL JDBC
> driver (Connector/J) are you using? I'm trying to figure out where the root
> cause might be (or if it is a known issue fixed in a later release).
>

Yes, MySQL 5.6.41.

Now sure how to tell the JDBC version. I unzipped the tomcat-jdbc.jar file and 
inspected the MANIFEST file, which shows the following. Is that what you need?

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.9.7
Created-By: 1.6.0_45-b06 (Sun Microsystems Inc.)
Export-Package: org.apache.tomcat.jdbc.naming;uses:="javax.naming,org.
 apache.juli.logging,javax.naming.spi";version="7.0.72",org.apache.tom
 cat.jdbc.pool;uses:="org.apache.juli.logging,javax.sql,org.apache.tom
 cat.jdbc.pool.jmx,javax.management,javax.naming,javax.naming.spi,org.
 apache.tomcat.jdbc.pool.interceptor";version="7.0.72",org.apache.tomc
 at.jdbc.pool.interceptor;uses:="org.apache.tomcat.jdbc.pool,org.apach
 e.juli.logging,javax.management.openmbean,javax.management";version="
 7.0.72",org.apache.tomcat.jdbc.pool.jmx;uses:="org.apache.tomcat.jdbc
 .pool,org.apache.juli.logging,javax.management";version="7.0.72"
Bundle-Vendor: Apache Software Foundation
Bundle-Version: 7.0.72
Bundle-Name: Apache Tomcat JDBC Connection Pool
Bundle-ManifestVersion: 2
Bundle-SymbolicName: org.apache.tomcat.jdbc
Import-Package:  javax.management;version="0", javax.management.openmb
 ean;version="0", javax.naming;version="0", javax.naming.spi;version="
 0", javax.sql;version="0", org.apache.juli.logging;version="0"


> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-11-19 Thread Eric Robinson
> From: Mark Thomas 
> Sent: Thursday, November 19, 2020 4:34 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 18/11/2020 16:28, Mark Thomas wrote:
> > On 18/11/2020 15:41, Eric Robinson wrote:
>
> 
>
> >>I tested it and we're now capturing file descriptor operations.
>
> 
>
> > I am very curious as to what we are going to see in these logs.
>
> Good news. We are a step closer to figuring out what is going on.
>
> The new strace logs show the file descriptor associated with the socket being
> closed. That means that the socket close MUST be coming from within the
> JVM. The key lines in the strace output that show this are (I've removed
> unrelated events from between these entries):
>
> 19166 15:24:21.108653 (+ 0.20) dup2(50, 336 
> 19166 15:24:21.108722 (+ 0.17) <... dup2 resumed>) = 336
> 19166 15:24:21.108778 (+ 0.27) close(336 
> 19166 15:24:21.109015 (+ 0.000152) <... close resumed>) = 0
>
> Has the Tomcat instance on the machine you obtained the last set of logs
> from been restarted since you obtained the logs? If not it should be possible
> to identify the thread that triggered the socket close as follows:
>

Sounds promising!

Unfortunately, the tomcats get restarted every night, so I'll have to watch 
closely for the error to happen today and I will do a thread dump as close as 
possible to the time of the error.

> Using the following command should trigger the generation of a thread
> dump to standard out (which may be redirected to a file).
>
> kill -3 
>
> Locate the thread dump.
>
> Each thread lists its native thread id e.g. nid=0x3522
>
> Look for the thread that has nid=0x4ade
>
> What is the first line for that thread? It should look something like this:
>
> "http-bio-8080-exec-8" #24 daemon prio=5 os_prio=0 cpu=0.17ms
> elapsed=20.54s tid=0x7f6e546c0800 nid=0x374e waiting on condition
> [0x7f6db38f7000]
>
>
> If the Tomcat instance has been restarted the native thread IDs will have
> changed. To identify the thread we'll need to repeat the last set of logs and
> once a failure is detected, take a thread dump and provide that along with
> the logs and I should be able to join the dots.
>
> Almost there...
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-11-18 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Wednesday, November 18, 2020 3:03 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 13/11/2020 23:46, Mark Thomas wrote:
> > Eric sent me a copy of the strace (thanks Eric) and while it is
> > consistent with what has already been observed, it didn't provide any
> > new information on the socket / file descriptor being closed.
> >
> > I'd like to suggest running again with the following:
> >
> > sudo strace -r -f -e trace=network,desc -p 
> >
> > That should log the file descriptor being closed (and other fd
> > activity). There are a couple of things we might be able to do with this:
> >
> > - we'll be able to determine if the socket is closed on the same or a
> >   different thread
> > - we might be able to correlate the time of closure with other logs
> >   (seems unlikely as we have this from Wireshark but you never know)
> > - the class before the close might be enlightening
>
> Hi Eric,
>
> I looked at the updated logs this morning. I don't see any additional logging
> for file descriptors in the strace output.
>
> I wonder if you need a slightly different command on your platform?
>
> I'd expect to see entries like this:
>
> [pid  8062]  0.70 openat(AT_FDCWD,
> "/home/mark/repos/asf-tomcat-master/output/build/webapps/ROOT/bg-
> nav.png",
> O_RDONLY) = 57
> [pid  8062]  0.27 fstat(57,  
> [pid  8062]  0.05 <... fstat resumed>{st_mode=S_IFREG|0664,
> st_size=1401, ...}) = 0
> [pid  8062]  0.43 read(57,  
> [pid  8062]  0.33 <... read
> resumed>"\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\n\0\0\0002\10\6\0\0\0e\3
> 3J".
> resumed>..,
> 1401) = 1401
> [pid  8062]  0.13 close(57 
>
> showing file access although what I really want to see are the calls to close
> the sockets (like the last two in the sequence below from a test where I used
> telnet to perform an HTTP/1.0 request)
>
> pid  8069]  0.124099 <... accept resumed>{sa_family=AF_INET6,
> sin6_port=htons(52656), sin6_flowinfo=htonl(0), inet_pton(AF_INET6,
> ":::127.0.0.1", _addr), sin6_scope_id=0}, [28]) = 50 ...
> [pid  8063]  0.000216 read(50,  
> [pid  8063]  0.58 <... read resumed>"GET / HTTP/1.0\r\n", 8192) = 16
> [pid  8063]  0.29 read(50,  
> [pid  8063]  0.30 <... read resumed>0x7f4f6c000e70, 8192) = -1
> EAGAIN (Resource temporarily unavailable)
> [pid  8064]  0.001061 read(50, "Host: a\r\n", 8192) = 9
> [pid  8064]  0.000239 read(50, 0x7f4f6e70, 8192) = -1 EAGAIN
> (Resource temporarily unavailable)
> [pid  8062]  0.000214 read(50, "\r\n", 8192) = 2
> [pid  8062]  0.007897 write(50, "HTTP/1.1 200 \r\nContent-Type:
> tex"..., 8192) = 8192
> [pid  8062]  0.000353 write(50, ">Tomcat Native\n
> "..., 3079) = 3079
> [pid  8062]  0.002071 getsockopt(50, SOL_SOCKET, SO_LINGER,
> {l_onoff=0, l_linger=0}, [8]) = 0
> [pid  8062]  0.000102 shutdown(50, SHUT_WR) = 0
> [pid  8068]  0.000342 close(50) = 0
>
> It is probably worth running a couple of quick tests to figure out the correct
> form of the strace command on your platform and then retesting.
>
> Mark
>

Entirely my fault. I'm new to strace, so I didn't know what to expect. I have 
now read the strace man page and I'm more up to speed. I tested it and we're 
now capturing file descriptor operations. The next batch of logs will be better.

-Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Weirdest Tomcat Behavior Ever?

2020-11-15 Thread Eric Robinson




> -Original Message-
> From: Mark Thomas 
> Sent: Friday, November 13, 2020 5:47 PM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Eric sent me a copy of the strace (thanks Eric) and while it is consistent 
> with
> what has already been observed, it didn't provide any new information on
> the socket / file descriptor being closed.
>
> I'd like to suggest running again with the following:
>
> sudo strace -r -f -e trace=network,desc -p 
>
> That should log the file descriptor being closed (and other fd activity). 
> There
> are a couple of things we might be able to do with this:
>
> - we'll be able to determine if the socket is closed on the same or a
>   different thread
> - we might be able to correlate the time of closure with other logs
>   (seems unlikely as we have this from Wireshark but you never know)
> - the class before the close might be enlightening
>
> Mark

We will do that first thing Monday when the users start work. Hopefully it will 
turn up something!

-Eric

>
> On 13/11/2020 22:05, Paul Carter-Brown wrote:
> > lol, and there I was feeling ignored :-)
> >
> > That was the first thing I would have looked at. Is the OS reporting
> > errors to the JVM writing data or is the JVM not writing the data.
> > Strace will tell you this quite easily.
> >
> >
> > On Fri, Nov 13, 2020 at 5:27 PM Eric Robinson
> > 
> > wrote:
> >
> >>
> >>> -Original Message-
> >>> From: Paul Carter-Brown 
> >>> Sent: Friday, October 16, 2020 6:11 AM
> >>> To: Tomcat Users List 
> >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>>
> >>> Hi Eric,
> >>>
> >>> These weird situations are sometimes best looked at by confirming
> >>> what
> >> the
> >>> OS is seeing from user-space.
> >>>
> >>> Can you run: sudo strace -r -f -e trace=network -p 
> >>>
> >>> You can then log that to a file and correlate and see if the kernel
> >>> is
> >> in fact
> >>> being asked to send the response.
> >>>
> >>> It's very insightful to  see what is actually going on between the
> >>> JVM
> >> and
> >>> Kernel.
> >>>
> >>> Paul
> >>
> >> Paul, this message went to spam and I just found it!
> >>
> >> I will try this suggestion immediately.
> >>
> >> -Eric
> >>
> >>>
> >>> On Fri, Oct 16, 2020 at 12:16 PM Mark Thomas 
> wrote:
> >>>
> >>>> On 16/10/2020 10:05, Eric Robinson wrote:
> >>>>> Hi Mark --
> >>>>>
> >>>>> Those are great questions. See answers below.
> >>>>>
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: Mark Thomas 
> >>>>>> Sent: Friday, October 16, 2020 2:20 AM
> >>>>>> To: users@tomcat.apache.org
> >>>>>> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>>>>>
> >>>>>> On 16/10/2020 00:27, Eric Robinson wrote:
> >>>>>>
> >>>>>> 
> >>>>>>
> >>>>>>> The localhost_access log shows a request received and an HTTP
> >>>>>>> 200
> >>>>>> response sent, as follows...
> >>>>>>>
> >>>>>>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> >>>>>>> /app/code.jsp?gizmoid=64438=5=2020-10-
> >>>>>> 15
> >>>>>>>
> >>>>>>
> >>>
> lterId=0=0=71340=321072
> >>>>>> oc
> >>>>>> e
> >>>>>>> ssid=40696=0.0715816=15102020125245.789063
> >>> HTTP/1.0
> >>>>>>> ?gizmoid=64438=5=2020-10-
> >>>>>> 15=0
> >>>>>>>
> >>>>>>
> >>>
> ionDID=0=71340=321072=40696
> >>>>>> &
> >>>>>> rn
> >>>>>>> d2=0.0715816=15102020125245.789063 200
> >>>>>>>
> >>>>>>> But WireShark shows what really happened. The server received
> >>>>>>> the GET
> >>>>>> request, and then it sent a FIN to terminate the connection. So
> >>>>>> if
> >>>> tomcat sent
> 

RE: Weirdest Tomcat Behavior Ever?

2020-11-13 Thread Eric Robinson
> From: Thomas Meyer 
> Sent: Friday, November 13, 2020 9:37 AM
> To: Tomcat Users List ; Mark Thomas
> ; users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
>
>
> Am 13. November 2020 10:06:18 MEZ schrieb Mark Thomas
> :
> >On 12/11/2020 14:19, Eric Robinson wrote:
> >>> From: Mark Thomas 
> >
> >
> >
> >>> I keep coming back to this. Something triggered this problem (note
> >that
> >>> trigger not necessarily the same as root cause). Given that the app,
> >Tomcat
> >>> and JVM versions didn't change that again points to some other
> >component.
> >>>
> >>
> >> Perfectly understandable. It's the oldest question in the diagnostic
> >playbook. What changed? I wish I had an answer. Whatever it was, if
> >impacted both upstream servers.
> >>
> >>> Picking just one of the wild ideas I've had is there some sort of
> >firewall, IDS,
> >>> IPS etc. that might be doing connection tracking and is, for some
> >reason,
> >>> getting it wrong and closing the connection in error?
> >>>
> >>
> >> Three is no firewall or IDS software running on the upstreams. The
> >only thing that comes to mind that may have been installed during that
> >timeframe is Sophos antivirus and Solar Winds RMM. Sophos was the first
> >thing I disabled when I saw the packet issues.
> >
> >ACK.
> >
> >>>>> The aim with this logging is to provide evidence of whether or not
> >>>>> there is a file descriptor handling problem in the JRE. My
> >>>>> expectation is that with these logs we will have reached the limit
> >of
> >>>>> what we can do with Tomcat but will be able to point you in the
> >right
> >>> direction for further investigation.
> >
> >I've had a chance to review these logs.
> >
> >To answer your (offlist) question about the HTTP/1.1 vs. HTTP/1.0 in
> >the Nginx logs I *think* the Nginx logs are showing that the request
> >received by Nginx is using HTTP/1.1.
> >
> >The logging does not indicate any issue with Java's handling of file
> >descriptors. The file descriptor associated with the socket where the
> >request fails is only observed to be associated with the socket where
> >the request fails. There is no indication that the file descriptor is
> >corrupted nor is there any indication that another thread tries to use
> >the same file descriptor.
> >
> >I dug a little into the exception where the write fails:
> >
> >java.net.SocketException: Bad file descriptor (Write failed)
> >at java.net.SocketOutputStream.socketWrite0(Native Method)
> >at
> >java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> >at
> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >at
> >org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEnd
> point.java:1491)
> >at
> >org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOu
> tputBuffer.java:247)
> >at
> >org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
> >at
> >org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutp
> ut
> >Buffer.java:183)
> >...
> >
> >
> >I took a look at the JRE source code. That exception is triggered by an
> >OS level error (9, EBADF, "Bad file descriptor") when the JRE makes the
> >OS call to write to the socket.
> >
> >Everything I have found online points to one of two causes for such an
> >error:
> >a) the socket has already been closed
> >b) the OS has run out of file descriptors
>
> Was it mentioned what OS is used? What Linux kernel version?
> Are any security modules like SELinux or similar is in use?
> It's maybe possible that a tracepoint exists that can be activated to get 
> better
> understanding when the OS closes the socket.
>

Way back at the beginning of the thread. 

CentOS Linux release 7.8.2003 (Core)

[root@001app01a ~]# uname -a
Linux 001app01a.ccnva.local 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 
17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@001app01a ~]# sestatus
SELinux status: enabled
SELinuxfs mount:/sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode:   permissive
Mode from config file:  permissive
Policy MLS status:  enabled
Policy deny_unknown status: allowed
Max kernel policy version:  31


> >
> >There is no indication 

RE: Weirdest Tomcat Behavior Ever?

2020-11-13 Thread Eric Robinson

> -Original Message-
> From: Paul Carter-Brown 
> Sent: Friday, October 16, 2020 6:11 AM
> To: Tomcat Users List 
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Hi Eric,
>
> These weird situations are sometimes best looked at by confirming what the
> OS is seeing from user-space.
>
> Can you run: sudo strace -r -f -e trace=network -p 
>
> You can then log that to a file and correlate and see if the kernel is in fact
> being asked to send the response.
>
> It's very insightful to  see what is actually going on between the JVM and
> Kernel.
>
> Paul

Paul, this message went to spam and I just found it!

I will try this suggestion immediately.

-Eric

>
> On Fri, Oct 16, 2020 at 12:16 PM Mark Thomas  wrote:
>
> > On 16/10/2020 10:05, Eric Robinson wrote:
> > > Hi Mark --
> > >
> > > Those are great questions. See answers below.
> > >
> > >
> > >> -Original Message-
> > >> From: Mark Thomas 
> > >> Sent: Friday, October 16, 2020 2:20 AM
> > >> To: users@tomcat.apache.org
> > >> Subject: Re: Weirdest Tomcat Behavior Ever?
> > >>
> > >> On 16/10/2020 00:27, Eric Robinson wrote:
> > >>
> > >> 
> > >>
> > >>> The localhost_access log shows a request received and an HTTP 200
> > >> response sent, as follows...
> > >>>
> > >>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> > >>> /app/code.jsp?gizmoid=64438=5=2020-10-
> > >> 15
> > >>>
> > >>
> lterId=0=0=71340=321072
> > >> oc
> > >> e
> > >>> ssid=40696=0.0715816=15102020125245.789063
> HTTP/1.0
> > >>> ?gizmoid=64438=5=2020-10-
> > >> 15=0
> > >>>
> > >>
> ionDID=0=71340=321072=40696
> > >> &
> > >> rn
> > >>> d2=0.0715816=15102020125245.789063 200
> > >>>
> > >>> But WireShark shows what really happened. The server received the
> > >>> GET
> > >> request, and then it sent a FIN to terminate the connection. So if
> > tomcat sent
> > >> an HTTP response, it did not make it out the Ethernet card.
> > >>>
> > >>> Is this the weirdest thing or what? Ideas would sure be appreciated!
> > >>
> > >> I am assuming there is a typo in your Java version and you are
> > >> using
> > Java 8.
> > >>
> > >
> > > Yes, Java 8.
> > >
> > >> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in
> > >> less
> > than 6
> > >> months). If you aren't already planning to upgrade (I'd suggest to
> > 9.0.x) then
> > >> you might want to start thinking about it.
> > >>
> > >
> > > Vendor constraint. It's a canned application published by a national
> > software company, and they have not officially approved tomcat 8 for
> > use on Linux yet.
> > >
> > >> I have a few ideas about what might be going on but rather than
> > >> fire out random theories I have some questions that might help
> > >> narrow things
> > down.
> > >>
> > >> 1. If this request was successful, how big is the response?
> > >>
> > >
> > > 1035 bytes.
> > >
> > >> 2. If this request was successful, how long would it typically take
> > >> to complete?
> > >>
> > >
> > > Under 60 ms.
> > >
> > >> 3. Looking at the Wireshark trace for a failed request, how long
> > >> after
> > the last
> > >> byte of the request is sent by the client does Tomcat send the FIN?
> > >>
> > >
> > > Maybe 100 microseconds.
> > >
> > >> 4. Looking at the Wireshark trace for a failed request, is the
> > >> request
> > fully sent
> > >> (including terminating CRLF etc)?
> > >>
> > >
> > > Yes, the request as seen by the tomcat server is complete and is
> > terminated by 0D 0A.
> > >
> > >> 5. Are there any proxies, firewalls etc between the user agent and
> > Tomcat?
> > >>
> > >
> > > User agent -> firewall -> nginx plus -> upstream tomcat servers
> > >
> > >> 6. What timeouts are configured for the Connector?
> > >>
> > >
> > > Sorry, which connector are you referring to?
> > >
> >

RE: Weirdest Tomcat Behavior Ever?

2020-11-13 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Friday, November 13, 2020 3:06 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 12/11/2020 14:19, Eric Robinson wrote:
> >> From: Mark Thomas 
>
> 
>
> >> I keep coming back to this. Something triggered this problem (note
> >> that trigger not necessarily the same as root cause). Given that the
> >> app, Tomcat and JVM versions didn't change that again points to some
> other component.
> >>
> >
> > Perfectly understandable. It's the oldest question in the diagnostic
> playbook. What changed? I wish I had an answer. Whatever it was, if
> impacted both upstream servers.
> >
> >> Picking just one of the wild ideas I've had is there some sort of
> >> firewall, IDS, IPS etc. that might be doing connection tracking and
> >> is, for some reason, getting it wrong and closing the connection in error?
> >>
> >
> > Three is no firewall or IDS software running on the upstreams. The only
> thing that comes to mind that may have been installed during that timeframe
> is Sophos antivirus and Solar Winds RMM. Sophos was the first thing I
> disabled when I saw the packet issues.
>
> ACK.
>
> >>>> The aim with this logging is to provide evidence of whether or not
> >>>> there is a file descriptor handling problem in the JRE. My
> >>>> expectation is that with these logs we will have reached the limit
> >>>> of what we can do with Tomcat but will be able to point you in the
> >>>> right
> >> direction for further investigation.
>
> I've had a chance to review these logs.
>
> To answer your (offlist) question about the HTTP/1.1 vs. HTTP/1.0 in the
> Nginx logs I *think* the Nginx logs are showing that the request received by
> Nginx is using HTTP/1.1.
>
> The logging does not indicate any issue with Java's handling of file
> descriptors. The file descriptor associated with the socket where the request
> fails is only observed to be associated with the socket where the request
> fails. There is no indication that the file descriptor is corrupted nor is 
> there
> any indication that another thread tries to use the same file descriptor.
>
> I dug a little into the exception where the write fails:
>
> java.net.SocketException: Bad file descriptor (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at
> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp
> oint.java:1491)
> at
> org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut
> putBuffer.java:247)
> at
> org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
> at
> org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutput
> Buffer.java:183)
> ...
>
>
> I took a look at the JRE source code. That exception is triggered by an OS 
> level
> error (9, EBADF, "Bad file descriptor") when the JRE makes the OS call to
> write to the socket.
>
> Everything I have found online points to one of two causes for such an
> error:
> a) the socket has already been closed
> b) the OS has run out of file descriptors
>
> There is no indication that the JRE or Tomcat or the application is doing a)
> Previous investigations have ruled out b)
>
> The wireshark trace indicates that the socket is closed before the write takes
> place which suggests a) rather more than b). Even so, I'd be tempted to
> double check b) and maybe try running Tomcat with -XX:+MaxFDLimit just to
> be sure.
>
> If you haven't already, I think now is the time to follow Paul Carter-Brown's
> advice from earlier in this thread and use strace to see what is going on
> between the JRE and the OS. The aim being to answer the question "what is
> triggering the socket close"
>

This is the second time you alluded to comments from a someone I haven't seen 
in the thread . I just checked my spam folder and found that, for some unknown 
reason, 4 messages in this long thread went to spam. They were from Paul 
Carter-Brown, Konstantin Kolinko, and Daniel Skiles. They probably thought I 
ignored them.  Now I'll go check out their recommendations.

> I can try and help interpret that log but I am far from an expert. You may
> want to seek help elsewhere.
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...

RE: Weirdest Tomcat Behavior Ever?

2020-11-12 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Thursday, November 12, 2020 4:08 AM
> To: Tomcat Users List ; Eric Robinson
> 
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 11/11/2020 22:48, Eric Robinson wrote:
> >> -Original Message-
> >> From: Mark Thomas 
> >> Sent: Monday, November 9, 2020 5:59 AM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>
> >> Eric,
> >>
> >> Time to prune the history and provide another summary I think. This
> >> summary isn't complete. There is more information in the history of
> >> the thread. I'm trying to focus on what seems to be the key information.
> >>
> >
> > Hi Mark -- So sorry for going silent for a couple of days. Our organization 
> > is
> neck-deep in a huge compliance project. Combine that with this issue we're
> working on together, and it's Perfect Storm time around here. We have a big
> meeting with the client and vendor tomorrow about all this and I'm working
> like heck to prevent this important customer from jumping ship.
>
> Understood. Let me know if there is anything I can do to help.
>
> > Now back to it!
> >
> >>
> >> Overview:
> >> A small number of requests are receiving a completely empty (no
> >> headers, no body) response.
> >>
> >
> > Just a FIN packet and that's all.
>
> Agreed.
>
> >> Environment
> >> Tomcat 7.0.72
> >>  - BIO HTTP (issue also observed with NIO)
> >>  - Source unknown (probably ASF)
> >> Java 1.8.0_221, Oracle
> >> CentOS 7.5, Azure
> >> Nginx reverse proxy
> >>  - Using HTTP/1.0
> >>  - No keep-alive
> >>  - No compression
> >> No (known) environment changes in the time period where this issue
> >> started
>
> I keep coming back to this. Something triggered this problem (note that
> trigger not necessarily the same as root cause). Given that the app, Tomcat
> and JVM versions didn't change that again points to some other component.
>

Perfectly understandable. It's the oldest question in the diagnostic playbook. 
What changed? I wish I had an answer. Whatever it was, if impacted both 
upstream servers.

> Picking just one of the wild ideas I've had is there some sort of firewall, 
> IDS,
> IPS etc. that might be doing connection tracking and is, for some reason,
> getting it wrong and closing the connection in error?
>

Three is no firewall or IDS software running on the upstreams. The only thing 
that comes to mind that may have been installed during that timeframe is Sophos 
antivirus and Solar Winds RMM. Sophos was the first thing I disabled when I saw 
the packet issues.

> As an aside, I mentioned earlier in this thread a similar issue we have been
> observing in the CI system. I tracked that down yesterday and I am certain
> the issues are unrelated. The CI issue was NIO specific (we see this issue 
> with
> BIO and NIO) and introduced by refactoring in 8.5.x (we see this issue in
> 7.0.x). Sorry this doesn't help.
>
> >> Results from debug logging
> >> - The request is read without error
> >> - The connection close is initiated from the Tomcat/Java side
> >> - The socket is closed before Tomcat tries to write the response
> >> - The application is not triggering the close of the socket
> >> - Tomcat is not triggering the close of the socket
> >> - When Tomcat does try and write we see the following exception
> >> java.net.SocketException: Bad file descriptor (Write failed)
> >>
> >> We have confirmed that the Java process is not hitting the limit for
> >> file descriptors.
> >>
> >> The file descriptor must have been valid when the request was read
> >> from the socket.
> >>
> >> The first debug log shows 2 other active connections from Nginx to
> >> Tomcat at the point the connection is closed unexpectedly.
> >>
> >> The second debug log shows 1 other active connection from Nginx to
> >> Tomcat at the point the connection is closed unexpectedly.
> >>
> >> The third debug log shows 1 other active connection from Nginx to
> >> Tomcat at the point the connection is closed unexpectedly.
> >>
> >> The fourth debug log shows no other active connection from Nginx to
> >> Tomcat at the point the connection is closed unexpectedly.
> >>
> >>
> >> Analysis
> >>
> >> We know the connection close isn't coming from Tomcat or the
> application.
> >> That leaves:
> >> 

RE: Weirdest Tomcat Behavior Ever?

2020-11-11 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Monday, November 9, 2020 5:59 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Eric,
>
> Time to prune the history and provide another summary I think. This
> summary isn't complete. There is more information in the history of the
> thread. I'm trying to focus on what seems to be the key information.
>

Hi Mark -- So sorry for going silent for a couple of days. Our organization is 
neck-deep in a huge compliance project. Combine that with this issue we're 
working on together, and it's Perfect Storm time around here. We have a big 
meeting with the client and vendor tomorrow about all this and I'm working like 
heck to prevent this important customer from jumping ship.

Now back to it!

>
> Overview:
> A small number of requests are receiving a completely empty (no headers,
> no body) response.
>

Just a FIN packet and that's all.

> Environment
> Tomcat 7.0.72
>  - BIO HTTP (issue also observed with NIO)
>  - Source unknown (probably ASF)
> Java 1.8.0_221, Oracle
> CentOS 7.5, Azure
> Nginx reverse proxy
>  - Using HTTP/1.0
>  - No keep-alive
>  - No compression
> No (known) environment changes in the time period where this issue started
>
> Results from debug logging
> - The request is read without error
> - The connection close is initiated from the Tomcat/Java side
> - The socket is closed before Tomcat tries to write the response
> - The application is not triggering the close of the socket
> - Tomcat is not triggering the close of the socket
> - When Tomcat does try and write we see the following exception
> java.net.SocketException: Bad file descriptor (Write failed)
>
> We have confirmed that the Java process is not hitting the limit for file
> descriptors.
>
> The file descriptor must have been valid when the request was read from
> the socket.
>
> The first debug log shows 2 other active connections from Nginx to Tomcat at
> the point the connection is closed unexpectedly.
>
> The second debug log shows 1 other active connection from Nginx to Tomcat
> at the point the connection is closed unexpectedly.
>
> The third debug log shows 1 other active connection from Nginx to Tomcat at
> the point the connection is closed unexpectedly.
>
> The fourth debug log shows no other active connection from Nginx to
> Tomcat at the point the connection is closed unexpectedly.
>
>
> Analysis
>
> We know the connection close isn't coming from Tomcat or the application.
> That leaves:
> - the JVM
> - the OS
> - the virtualisation layer (since this is Azure I am assuming there is
>   one)
>
> We are approaching the limit of what we can debug via Tomcat (and my area
> of expertise. The evidence so far is pointing to an issue lower down the
> network stack (JVM, OS or virtualisation layer).
>

Can't disagree with you there.

> I think the next, and possibly last, thing we can do from Tomcat is log some
> information on the file descriptor associated with the socket. That is going 
> to
> require some reflection to read JVM internals.
>
> Patch files here:
> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v4/
>
> Source code here:
> https://github.com/markt-asf/tomcat/tree/debug-7.0.72
>

I will apply these tonight.

> The file descriptor usage count is guarded by a lock object so this patch adds
> quite a few syncs. For the load you are seeing that shouldn't an issue but
> there is a change it will impact performance.
>

Based on observation of load, I'm not too concerned about that. Maybe a little. 
I'll keep an eye on it.

> The aim with this logging is to provide evidence of whether or not there is a
> file descriptor handling problem in the JRE. My expectation is that with these
> logs we will have reached the limit of what we can do with Tomcat but will be
> able to point you in the right direction for further investigation.
>

I'll get this done right away.

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Weirdest Tomcat Behavior Ever?

2020-11-06 Thread Eric Robinson
> > -Original Message-
> > From: Stefan Mayr 
> > Sent: Thursday, November 5, 2020 4:24 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > Am 03.11.2020 um 16:05 schrieb Eric Robinson:
> > >> -Original Message-
> > >> From: Eric Robinson 
> > >> Sent: Tuesday, November 3, 2020 8:21 AM
> > >> To: Tomcat Users List 
> > >> Subject: RE: Weirdest Tomcat Behavior Ever?
> > >>
> > >>> From: Mark Thomas 
> > >>> Sent: Tuesday, November 3, 2020 2:06 AM
> > >>> To: Tomcat Users List 
> > >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> > >>>
> > >>> On 02/11/2020 12:16, Eric Robinson wrote:
> > >>>
> > >>> 
> > >>>
> > >>>> Gotcha, thanks for the clarification. Let's see what happens when
> > >>>> the users
> > >>> start hitting it at 8:00 am Eastern.
> > >>>
> > >>> Progress. The first attempt to write to the socket triggers the
> > >>> following
> > >>> exception:
> > >>>
> > >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> > >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> > >>> [301361476]
> > >>>  java.net.SocketException: Bad file descriptor (Write failed)
> > >>> at java.net.SocketOutputStream.socketWrite0(Native Method)
> > >>> at
> > >>>
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> > >>> at
> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> > >>> at
> > >>>
> > ...
> >
> > >>> Because this is an instance of an IOException, Tomcat assumes it
> > >>> has been caused by the client dropping the connection and silently
> > >>> swallows it. I'll be changing that later today so the exception is
> > >>> logged as DEBUG level for all new Tomcat versions.
> > >>>
> > >>> Possible causes of "java.net.SocketException: Bad file descriptor"
> > >>> I've been able to find are:
> > >>>
> > >>> 1. OS running out of file descriptors.
> > >>>
> > >>> 2.Trying to use a closed socket.
> > >>>
> > >>> I want to review the source code to see if there are any others.
> > >>>
> > >>> I don't think we are seeing 2 as there is no indication of the
> > >>> Socket, InputStream or OutputStream being closed in the logs.
> > >>>
> > >>> That leaves 1. Possible causes here are a file descriptor leak or
> > >>> normal operations occasionally needing more than the current limit.
> > >>> I don't think it is a leak as I'd expect to see many more errors
> > >>> of this type after the first and we aren't seeing that. That
> > >>> leaves the possibility of the current limit being a little too low.
> > >>>
> > >>> My recommendation at this point is to increase the limit for file
> > descriptors.
> > >>> Meanwhile, I'll look at the JRE source to see if there are any
> > >>> other possible triggers for this exception.
> > >>>
> > >>> Mark
> > >>>
> > >>>
> > >>
> > >> On the tomcat server, max open file descriptors is currently
> > >> 2853957
> > >>
> > >> [root@001app01a ~]# sysctl fs.file-max fs.file-max = 2853957
> > >>
> > >> Most of the time, the number of open files appears to run about
> 600,000.
> > >>
> > >>  What do you think of watching the open file count and seeing if
> > >> the number gets up around the ceiling when the socket write failure
> > >> occurs? Something like...
> > >>
> > >> [root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo
> > >> "$(date
> > >> +%H:%M:%S) $FILES";done
> > >> 09:11:15 591671
> > >> 09:11:35 627347
> > >> 09:11:54 626851
> > >> 09:12:11 626429
> > >> 09:12:26 545748
> > >> 09:12:42 548578
> > >> 09:12:58 551487
> > >> 09:13:14 516700
> > >> 09:13:30 513312
> > >> 09:13:45 512830
> > >> 09:14:02 58
> > >> 09:14:18 5

RE: Weirdest Tomcat Behavior Ever?

2020-11-06 Thread Eric Robinson
> -Original Message-
> From: Stefan Mayr 
> Sent: Thursday, November 5, 2020 4:24 PM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Am 03.11.2020 um 16:05 schrieb Eric Robinson:
> >> -Original Message-
> >> From: Eric Robinson 
> >> Sent: Tuesday, November 3, 2020 8:21 AM
> >> To: Tomcat Users List 
> >> Subject: RE: Weirdest Tomcat Behavior Ever?
> >>
> >>> From: Mark Thomas 
> >>> Sent: Tuesday, November 3, 2020 2:06 AM
> >>> To: Tomcat Users List 
> >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>>
> >>> On 02/11/2020 12:16, Eric Robinson wrote:
> >>>
> >>> 
> >>>
> >>>> Gotcha, thanks for the clarification. Let's see what happens when
> >>>> the users
> >>> start hitting it at 8:00 am Eastern.
> >>>
> >>> Progress. The first attempt to write to the socket triggers the
> >>> following
> >>> exception:
> >>>
> >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> >>> [301361476]
> >>>  java.net.SocketException: Bad file descriptor (Write failed)
> >>> at java.net.SocketOutputStream.socketWrite0(Native Method)
> >>> at
> >>>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> >>> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>> at
> >>>
> ...
>
> >>> Because this is an instance of an IOException, Tomcat assumes it has
> >>> been caused by the client dropping the connection and silently
> >>> swallows it. I'll be changing that later today so the exception is
> >>> logged as DEBUG level for all new Tomcat versions.
> >>>
> >>> Possible causes of "java.net.SocketException: Bad file descriptor"
> >>> I've been able to find are:
> >>>
> >>> 1. OS running out of file descriptors.
> >>>
> >>> 2.Trying to use a closed socket.
> >>>
> >>> I want to review the source code to see if there are any others.
> >>>
> >>> I don't think we are seeing 2 as there is no indication of the
> >>> Socket, InputStream or OutputStream being closed in the logs.
> >>>
> >>> That leaves 1. Possible causes here are a file descriptor leak or
> >>> normal operations occasionally needing more than the current limit.
> >>> I don't think it is a leak as I'd expect to see many more errors of
> >>> this type after the first and we aren't seeing that. That leaves the
> >>> possibility of the current limit being a little too low.
> >>>
> >>> My recommendation at this point is to increase the limit for file
> descriptors.
> >>> Meanwhile, I'll look at the JRE source to see if there are any other
> >>> possible triggers for this exception.
> >>>
> >>> Mark
> >>>
> >>>
> >>
> >> On the tomcat server, max open file descriptors is currently 2853957
> >>
> >> [root@001app01a ~]# sysctl fs.file-max fs.file-max = 2853957
> >>
> >> Most of the time, the number of open files appears to run about 600,000.
> >>
> >>  What do you think of watching the open file count and seeing if the
> >> number gets up around the ceiling when the socket write failure
> >> occurs? Something like...
> >>
> >> [root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo
> >> "$(date
> >> +%H:%M:%S) $FILES";done
> >> 09:11:15 591671
> >> 09:11:35 627347
> >> 09:11:54 626851
> >> 09:12:11 626429
> >> 09:12:26 545748
> >> 09:12:42 548578
> >> 09:12:58 551487
> >> 09:13:14 516700
> >> 09:13:30 513312
> >> 09:13:45 512830
> >> 09:14:02 58
> >> 09:14:18 568233
> >> 09:14:35 570158
> >> 09:14:51 566269
> >> 09:15:07 547389
> >> 09:15:23 544203
> >> 09:15:38 546395
> >>
> >> It's not ideal; as it seems to take 15-20 seconds to count them using lsof.
> >>
> >>
> >>
> >
> > Wait, never mind. I realized the per-process limits are what matters. I
> checked, and nofile was set to 4096 for the relevant java process.
> >
> > I did...
> >
> > # prlimit --pid 87

RE: Weirdest Tomcat Behavior Ever?

2020-11-04 Thread Eric Robinson
> From: Mark Thomas 
> Sent: Wednesday, November 4, 2020 11:39 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 03/11/2020 15:05, Eric Robinson wrote:
> >> From: Eric Robinson 
> >>> From: Mark Thomas 
>
> 
>
> >>> Progress. The first attempt to write to the socket triggers the
> >>> following
> >>> exception:
> >>>
> >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> >>> [301361476]
> >>>  java.net.SocketException: Bad file descriptor (Write failed)
>
> 
>
> >>> 1. OS running out of file descriptors.
> >>>
> >>> 2.Trying to use a closed socket.
> >>>
> >>> I want to review the source code to see if there are any others.
>
> There is an option 3 - a JVM bug. There were bugs in this area back in the
> 1.2/1.3/1.4 days. It seems unlikely that such a bug resurfaced now - 
> especially
> given that the issue happens with NIO as well as BIO.
>
> >>> I don't think we are seeing 2 as there is no indication of the
> >>> Socket, InputStream or OutputStream being closed in the logs.
> >>>
> >>> That leaves 1. Possible causes here are a file descriptor leak or
> >>> normal operations occasionally needing more than the current limit.
> >>> I don't think it is a leak as I'd expect to see many more errors of
> >>> this type after the first and we aren't seeing that. That leaves the
> >>> possibility of the current limit being a little too low.
> >>>
> >>> My recommendation at this point is to increase the limit for file
> descriptors.
> >>> Meanwhile, I'll look at the JRE source to see if there are any other
> >>> possible triggers for this exception.
>
> 
>
> > Wait, never mind. I realized the per-process limits are what matters. I
> checked, and nofile was set to 4096 for the relevant java process.
> >
> > I did...
> >
> > # prlimit --pid 8730 --nofile=16384:16384
> >
> > That should give java some extra breathing room if the issue is max open
> files, right?
>
> I'm not the person to ask that question. Linux administration is not an area 
> I'd
> consider myself sufficiently knowledgeable to give a definitive answer. It
> looks OK based on some quick searching.
>
> How have things been with the higher limit? More issues, fewer issues,
> about the same? Or maybe even no issues (he asks hopefully)?
>
> Mark
>

Not enough data collected to know yet. We did see at least one instance of the 
error, but I'll know better tomorrow.

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Weirdest Tomcat Behavior Ever?

2020-11-03 Thread Eric Robinson
> From: Christopher Schultz 
> Sent: Tuesday, November 3, 2020 9:26 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Eric,
>
> On 11/3/20 10:05, Eric Robinson wrote:
> >> -Original Message-
> >> From: Eric Robinson 
> >> Sent: Tuesday, November 3, 2020 8:21 AM
> >> To: Tomcat Users List 
> >> Subject: RE: Weirdest Tomcat Behavior Ever?
> >>
> >>> From: Mark Thomas 
> >>> Sent: Tuesday, November 3, 2020 2:06 AM
> >>> To: Tomcat Users List 
> >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>>
> >>> On 02/11/2020 12:16, Eric Robinson wrote:
> >>>
> >>> 
> >>>
> >>>> Gotcha, thanks for the clarification. Let's see what happens when
> >>>> the users
> >>> start hitting it at 8:00 am Eastern.
> >>>
> >>> Progress. The first attempt to write to the socket triggers the
> >>> following
> >>> exception:
> >>>
> >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> >>> [301361476]
> >>>   java.net.SocketException: Bad file descriptor (Write failed)
> >>>  at java.net.SocketOutputStream.socketWrite0(Native Method)
> >>>  at
> >>>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> >>>  at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>>  at
> >>>
> >>
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEnd
> >> p
> >>> oint.java:1409)
> >>>  at
> >>> org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(Interna
> >>> lO
> >>> ut
> >>> putBuffer.java:247)
> >>>  at
> >> org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
> >>>  at
> >>>
> >>
> org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutp
> >> u
> >>> t
> >>> Buffer.java:183)
> >>>  at
> >>> org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp
> >>> 11
> >>> Pr
> >>> ocessor.java:761)
> >>>  at org.apache.coyote.Response.action(Response.java:174)
> >>>  at org.apache.coyote.Response.finish(Response.java:274)
> >>>  at
> >>>
> org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322)
> >>>  at
> >>>
> >> org.apache.catalina.connector.Response.finishResponse(Response.java:5
> >> 37)
> >>>  at
> >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.ja
> >>> va
> >>> :4
> >>> 80)
> >>>  at
> >>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHtt
> >>> p1
> >>> 1P
> >>> rocessor.java:1083)
> >>>  at
> >>>
> >>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
> >> A
> >>> b
> >>> stractProtocol.java:640)
> >>>  at
> >>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoi
> >>> nt
> >>> .ja
> >>> va:321)
> >>>  at
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
> >>> .j
> >>> av
> >>> a:1149)
> >>>  at
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> >>> ja
> >>> va:624)
> >>>  at
> >>>
> >>
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskTh
> >> r
> >>> ead.java:61)
> >>>  at java.lang.Thread.run(Thread.java:748)
> >>>
> >>> Because this is an instance of an IOException, Tomcat assumes it has
> >>> been caused by the client dropping the connection and silently
> >>> swallows it. I'll be changing that later today so the exception is
> >>> logged as DEBUG level for all new Tomcat versions.
> >>>
> >>> Possible causes of "java.net.SocketException: Bad file descriptor"
> >>> I've been able to find are:
> >>>
> >>> 1. OS running out of file descriptors.
> >>>
> >>> 2.Trying to use a closed

RE: Weirdest Tomcat Behavior Ever?

2020-11-03 Thread Eric Robinson
> -Original Message-
> From: Eric Robinson 
> Sent: Tuesday, November 3, 2020 8:21 AM
> To: Tomcat Users List 
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> > From: Mark Thomas 
> > Sent: Tuesday, November 3, 2020 2:06 AM
> > To: Tomcat Users List 
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > On 02/11/2020 12:16, Eric Robinson wrote:
> >
> > 
> >
> > > Gotcha, thanks for the clarification. Let's see what happens when
> > > the users
> > start hitting it at 8:00 am Eastern.
> >
> > Progress. The first attempt to write to the socket triggers the
> > following
> > exception:
> >
> > 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> > org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> > [301361476]
> >  java.net.SocketException: Bad file descriptor (Write failed)
> > at java.net.SocketOutputStream.socketWrite0(Native Method)
> > at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> > at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> > at
> >
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp
> > oint.java:1409)
> > at
> > org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalO
> > ut
> > putBuffer.java:247)
> > at
> org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
> > at
> >
> org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutpu
> > t
> > Buffer.java:183)
> > at
> > org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11
> > Pr
> > ocessor.java:761)
> > at org.apache.coyote.Response.action(Response.java:174)
> > at org.apache.coyote.Response.finish(Response.java:274)
> > at
> > org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322)
> > at
> >
> org.apache.catalina.connector.Response.finishResponse(Response.java:537)
> > at
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :4
> > 80)
> > at
> > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp1
> > 1P
> > rocessor.java:1083)
> > at
> >
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A
> > b
> > stractProtocol.java:640)
> > at
> > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint
> > .ja
> > va:321)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > av
> > a:1149)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > ja
> > va:624)
> > at
> >
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> > ead.java:61)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Because this is an instance of an IOException, Tomcat assumes it has
> > been caused by the client dropping the connection and silently
> > swallows it. I'll be changing that later today so the exception is
> > logged as DEBUG level for all new Tomcat versions.
> >
> > Possible causes of "java.net.SocketException: Bad file descriptor"
> > I've been able to find are:
> >
> > 1. OS running out of file descriptors.
> >
> > 2.Trying to use a closed socket.
> >
> > I want to review the source code to see if there are any others.
> >
> > I don't think we are seeing 2 as there is no indication of the Socket,
> > InputStream or OutputStream being closed in the logs.
> >
> > That leaves 1. Possible causes here are a file descriptor leak or
> > normal operations occasionally needing more than the current limit. I
> > don't think it is a leak as I'd expect to see many more errors of this
> > type after the first and we aren't seeing that. That leaves the
> > possibility of the current limit being a little too low.
> >
> > My recommendation at this point is to increase the limit for file 
> > descriptors.
> > Meanwhile, I'll look at the JRE source to see if there are any other
> > possible triggers for this exception.
> >
> > Mark
> >
> >
>
> On the tomcat server, max open file descriptors is currently 2853957
>
> [root@001app01a ~]# sysctl fs.file-max
> fs.file-max = 2853957
>
> Most of the time, the number of open files appears to run about 600,000.
>
>  What do you think of watching the open file count and seeing if the number
> gets up around the ceiling 

RE: Weirdest Tomcat Behavior Ever?

2020-11-03 Thread Eric Robinson
> From: Mark Thomas 
> Sent: Tuesday, November 3, 2020 2:06 AM
> To: Tomcat Users List 
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 02/11/2020 12:16, Eric Robinson wrote:
>
> 
>
> > Gotcha, thanks for the clarification. Let's see what happens when the users
> start hitting it at 8:00 am Eastern.
>
> Progress. The first attempt to write to the socket triggers the following
> exception:
>
> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> [301361476]
>  java.net.SocketException: Bad file descriptor (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp
> oint.java:1409)
> at
> org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut
> putBuffer.java:247)
> at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
> at
> org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutput
> Buffer.java:183)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Pr
> ocessor.java:761)
> at org.apache.coyote.Response.action(Response.java:174)
> at org.apache.coyote.Response.finish(Response.java:274)
> at
> org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322)
> at
> org.apache.catalina.connector.Response.finishResponse(Response.java:537)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4
> 80)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11P
> rocessor.java:1083)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Ab
> stractProtocol.java:640)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja
> va:321)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
> a:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:624)
> at
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> ead.java:61)
> at java.lang.Thread.run(Thread.java:748)
>
> Because this is an instance of an IOException, Tomcat assumes it has been
> caused by the client dropping the connection and silently swallows it. I'll be
> changing that later today so the exception is logged as DEBUG level for all
> new Tomcat versions.
>
> Possible causes of "java.net.SocketException: Bad file descriptor" I've been
> able to find are:
>
> 1. OS running out of file descriptors.
>
> 2.Trying to use a closed socket.
>
> I want to review the source code to see if there are any others.
>
> I don't think we are seeing 2 as there is no indication of the Socket,
> InputStream or OutputStream being closed in the logs.
>
> That leaves 1. Possible causes here are a file descriptor leak or normal
> operations occasionally needing more than the current limit. I don't think it 
> is
> a leak as I'd expect to see many more errors of this type after the first and
> we aren't seeing that. That leaves the possibility of the current limit being 
> a
> little too low.
>
> My recommendation at this point is to increase the limit for file descriptors.
> Meanwhile, I'll look at the JRE source to see if there are any other possible
> triggers for this exception.
>
> Mark
>
>

On the tomcat server, max open file descriptors is currently 2853957

[root@001app01a ~]# sysctl fs.file-max
fs.file-max = 2853957

Most of the time, the number of open files appears to run about 600,000.

 What do you think of watching the open file count and seeing if the number 
gets up around the ceiling when the socket write failure occurs? Something 
like...

[root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo "$(date 
+%H:%M:%S) $FILES";done
09:11:15 591671
09:11:35 627347
09:11:54 626851
09:12:11 626429
09:12:26 545748
09:12:42 548578
09:12:58 551487
09:13:14 516700
09:13:30 513312
09:13:45 512830
09:14:02 58
09:14:18 568233
09:14:35 570158
09:14:51 566269
09:15:07 547389
09:15:23 544203
09:15:38 546395

It's not ideal; as it seems to take 15-20 seconds to count them using lsof.



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-11-02 Thread Eric Robinson
> From: Mark Thomas 
> Sent: Monday, November 2, 2020 5:38 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 02/11/2020 11:18, Eric Robinson wrote:
> >> -Original Message-
> >> From: Mark Thomas 
> >> Sent: Sunday, November 1, 2020 11:50 AM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>
> >> On 01/11/2020 16:25, Mark Thomas wrote:
> >>> 
> >>>
> >>> Keeping the previous logs for reference:
> >>>
> >>>>> Source  Time Activity
> >>>>> 
> >>>>> pcap15:14:25.375451  SYN proxy to Tomcat
> >>>>> pcap15:14:25.375493  SYN, ACK Tomcat to proxy
> >>>>> pcap15:14:25.375839  ACK proxy to Tomcat
> >>>>> pcap15:14:25.375892  GET request proxy to Tomcat
> >>>>> pcap15:14:25.375911  ACK Tomcat to proxy
> >>>>> debug   15:14:25.376 o.a.c.http11.InternalOutputBuffer.init
> >>>>> pcap15:14:25.376777  FIN Tomcat to proxy
> >>>>> pcap15:14:25.377036  FIN, ACK proxy to Tomcat
> >>>>> pcap15:14:25.377048  ACK Tomcat proxy
> >>>>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer.commit
> >>>>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.doWrite
> >>>>> debug   15:14:25.383
> o.a.c.http11.InternalOutputBuffer$1.nextRequest
> >>>>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.recycle
> >>>>>
> >>>>> Tomcat writes the request body to the buffer but when Tomcat tries
> >>>>> to flush those bytes out to the network it finds that the socket
> >>>>> has been closed. That normally indicates that the client has
> >>>>> dropped the connection. This is sufficiently common that Tomcat
> >>>>> swallows the exception. However, we know (from the pcap file) that
> >>>>> the client did not drop the connection, Tomcat did.
> >>>
> >>> The next round of logging added a little more detail for the
> >>> InternalOutputBuffer and wrapped the Socket to see when close() was
> >> called.
> >>>
> >>> The results from the next round of logging are:
> >>>
> >>> Source  Time Activity
> >>> 
> >>> pcap13:31:26.344453  SYN proxy to Tomcat
> >>> pcap13:31:26.344481  SYN, ACK Tomcat to proxy
> >>> debug   13:31:26.345 DebugSocket object created
> >>> debug   13:31:26.345 o.a.c.http11.InternalOutputBuffer.init
> >>> pcap13:31:26.345138  ACK proxy to Tomcat
> >>> pcap13:31:26.345174  GET request proxy to Tomcat
> >>> pcap13:31:26.345194  ACK Tomcat to proxy
> >>> pcap13:31:26.395281  FIN, ACK Tomcat to proxy
> >>> pcap13:31:26.395725  ACK proxy to Tomcat
> >>> pcap13:31:26.395741  FIN, ACK proxy to Tomcat
> >>> pcap13:31:26.395754  ACK Tomcat to proxy
> >>> debug   13:31:26.403 o.a.c.http11.InternalOutputBuffer.commit
> >>> debug   13:31:26.403 o.a.c.http11.InternalOutputBuffer$1.doWrite
> >>> debug   13:31:26.404 o.a.c.http11.InternalOutputBuffer$1.nextRequest
> >>> debug   13:31:26.404 o.a.c.http11.InternalOutputBuffer$1.recycle
> >>> debug   13:31:26.404 DebugSocket.close called
> >>>
> >>> This shows that the socket is closed long before Tomcat tries to
> >>> write to it (that would be after the doWrite but before nextRequest)
> >>> or Tomcat explicitly closes the socket.
> >>>
> >>> This also shows that neither Tomcat nor the application are directly
> >>> calling close() on the socket to trigger the close shown by pcap.
> >>>
> >>> I continue to come up with theories as to what might be happening
> >>> but they all seem unlikely.
> >>>
> >>> This is the BIO connector so the only time the socket should change
> >>> state is during a method call. While it might seem a little over the
> >>> top I think the next step is to log every single method call to
> >>> DebugSocket along with any exception generated. We need to try and
> >>> correlate the premature socket closure with something in the JVM. If
> >>

RE: Weirdest Tomcat Behavior Ever?

2020-11-02 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Sunday, November 1, 2020 11:50 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 01/11/2020 16:25, Mark Thomas wrote:
> > 
> >
> > Keeping the previous logs for reference:
> >
> >>> Source  Time Activity
> >>> 
> >>> pcap15:14:25.375451  SYN proxy to Tomcat
> >>> pcap15:14:25.375493  SYN, ACK Tomcat to proxy
> >>> pcap15:14:25.375839  ACK proxy to Tomcat
> >>> pcap15:14:25.375892  GET request proxy to Tomcat
> >>> pcap15:14:25.375911  ACK Tomcat to proxy
> >>> debug   15:14:25.376 o.a.c.http11.InternalOutputBuffer.init
> >>> pcap15:14:25.376777  FIN Tomcat to proxy
> >>> pcap15:14:25.377036  FIN, ACK proxy to Tomcat
> >>> pcap15:14:25.377048  ACK Tomcat proxy
> >>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer.commit
> >>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.doWrite
> >>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.nextRequest
> >>> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.recycle
> >>>
> >>> Tomcat writes the request body to the buffer but when Tomcat tries
> >>> to flush those bytes out to the network it finds that the socket has
> >>> been closed. That normally indicates that the client has dropped the
> >>> connection. This is sufficiently common that Tomcat swallows the
> >>> exception. However, we know (from the pcap file) that the client did
> >>> not drop the connection, Tomcat did.
> >
> > The next round of logging added a little more detail for the
> > InternalOutputBuffer and wrapped the Socket to see when close() was
> called.
> >
> > The results from the next round of logging are:
> >
> > Source  Time Activity
> > 
> > pcap13:31:26.344453  SYN proxy to Tomcat
> > pcap13:31:26.344481  SYN, ACK Tomcat to proxy
> > debug   13:31:26.345 DebugSocket object created
> > debug   13:31:26.345 o.a.c.http11.InternalOutputBuffer.init
> > pcap13:31:26.345138  ACK proxy to Tomcat
> > pcap13:31:26.345174  GET request proxy to Tomcat
> > pcap13:31:26.345194  ACK Tomcat to proxy
> > pcap13:31:26.395281  FIN, ACK Tomcat to proxy
> > pcap13:31:26.395725  ACK proxy to Tomcat
> > pcap13:31:26.395741  FIN, ACK proxy to Tomcat
> > pcap13:31:26.395754  ACK Tomcat to proxy
> > debug   13:31:26.403 o.a.c.http11.InternalOutputBuffer.commit
> > debug   13:31:26.403 o.a.c.http11.InternalOutputBuffer$1.doWrite
> > debug   13:31:26.404 o.a.c.http11.InternalOutputBuffer$1.nextRequest
> > debug   13:31:26.404 o.a.c.http11.InternalOutputBuffer$1.recycle
> > debug   13:31:26.404 DebugSocket.close called
> >
> > This shows that the socket is closed long before Tomcat tries to write
> > to it (that would be after the doWrite but before nextRequest) or
> > Tomcat explicitly closes the socket.
> >
> > This also shows that neither Tomcat nor the application are directly
> > calling close() on the socket to trigger the close shown by pcap.
> >
> > I continue to come up with theories as to what might be happening but
> > they all seem unlikely.
> >
> > This is the BIO connector so the only time the socket should change
> > state is during a method call. While it might seem a little over the
> > top I think the next step is to log every single method call to
> > DebugSocket along with any exception generated. We need to try and
> > correlate the premature socket closure with something in the JVM. If
> > there is no correlation to find then we really are into the realm of
> > very strange JVM  and/or OS bugs.
> >
> > I'll start work on v3 of the debug patch.
>
> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v3/
>
> I opted to wrap both the InputStream and OutputStream associated with the
> socket and log every method call and every exception thrown. I opted not to
> log parameters with the exception of socket timeout since that might be
> relevant.
>
> The debug logs will be noticeably more verbose that last time. Feel free to
> adjust the number/size of debug log files as suits your environment.
>
> Mark
>

I applied the V3 patch and confirmed that content is being written to 
debug0.log, but right away I noticed an error recurring repeatedly. See below.



02-Nov-2020 06:17:12.992 FINE [http-bio-3016-exec-1] 
org.apache.tomcat.util.net.JIoEndpoint$DebugSocket.close bind
02-Nov-2020 06:17:15.634 FINE [http-bio-3016-Acceptor-0] 
org.apache.tomcat.util.net.JIoEndpoint$DebugSocket. DebugSocket 
[783035752], inner Socket [1397828144] for client port [52730]
02-Nov-2020 06:17:15.634 FINE [http-bio-3016-Acceptor-0] 
org.apache.tomcat.util.net.JIoEndpoint$DebugSocket.setSoLinger [783035752]
02-Nov-2020 06:17:15.635 FINE [http-bio-3016-Acceptor-0] 
org.apache.tomcat.util.net.JIoEndpoint$DebugSocket.setSoTimeout [783035752], 
timeout [2]

RE: Weirdest Tomcat Behavior Ever?

2020-10-29 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Thursday, October 29, 2020 5:45 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 28/10/2020 20:32, Mark Thomas wrote:
>
> 
>
> > I have the off-list mail and will start looking at the logs shortly.
>
> Progress. I think. I'll start with the following summary of the log data.
>
> Source  Time Activity
> 
> pcap15:14:25.375451  SYN proxy to Tomcat
> pcap15:14:25.375493  SYN, ACK Tomcat to proxy
> pcap15:14:25.375839  ACK proxy to Tomcat
> pcap15:14:25.375892  GET request proxy to Tomcat
> pcap15:14:25.375911  ACK Tomcat to proxy
> debug   15:14:25.376 o.a.c.http11.InternalOutputBuffer.init
> pcap15:14:25.376777  FIN Tomcat to proxy
> pcap15:14:25.377036  FIN, ACK proxy to Tomcat
> pcap15:14:25.377048  ACK Tomcat proxy
> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer.commit
> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.doWrite
> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.nextRequest
> debug   15:14:25.383 o.a.c.http11.InternalOutputBuffer$1.recycle
>
> Tomcat writes the request body to the buffer but when Tomcat tries to flush
> those bytes out to the network it finds that the socket has been closed. That
> normally indicates that the client has dropped the connection. This is
> sufficiently common that Tomcat swallows the exception. However, we
> know (from the pcap file) that the client did not drop the connection, Tomcat
> did.
>

That's the first hard evidence of where the problem lies. I feel like we're 
zeroing in on it.

> What is strange here is that with BIO is there is a 1-2-1 relationship between
> threads and sockets for the life of the socket. While I can see how a retained
> reference could truncate a response (including the
> headers) I don't yet see how the socket could get closed.
>
> I think more debug logging is required. I am currently working on that.
>

I'll apply the new patch and restart the tomcat this evening. Just to be safe, 
I'm only applying it to one of the tomcat instances.

--Eric

> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-10-28 Thread Eric Robinson
> From: Eric Robinson 
> Sent: Tuesday, October 27, 2020 11:33 PM
> To: Tomcat Users List 
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> > From: Mark Thomas 
> > Sent: Tuesday, October 27, 2020 12:06 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > On 27/10/2020 16:29, Eric Robinson wrote:
> > >> On 27/10/2020 15:22, Eric Robinson wrote:
> >
> > 
> >
> > >>> I had switched to the NIO connector last week. Is that why the
> > >>> logs are still
> > >> at 0 bytes?
> > >>
> > >> Yes. I only added the debug logging to BIO.
> > >>
> > >> Somewhere in a previous thread I recommended switching back to BIO
> > >> as the code is simpler and therefore easier to debug.
> > >>
> > >> Note that my previous analysis of the access log valve entries was
> > >> based on the assumption you had switched back to BIO.
> > >>
> > >> Given everything else is in place, I'd suggest switching back to
> > >> BIO when you can, waiting for the issue to re-appear and then
> > >> looking at the
> > debug logs.
> > >>
> > >> Mark
> > >>
> > >>
> > >
> > > Oh man, I must have missed the recommendation to switch back to BIO.
> > > I
> > will do that ASAP, most likely this evening. My apologies for causing
> > the wasted effort.
> >
> > No worries.
> >
> > Looking at the NIO code, if you had switched back to BIO last night
> > I'm fairly sure we'd now be adding the debug logging to BIO anyway so
> > we'll end up in the same place.
> >
> > Now seems like a good time to point out that, as email based debugging
> > sessions go, this is very much at the better end. Having someone as
> > responsive as you are, who actually answers the questions asked and is
> > willing to run custom patches to collect debug info makes it much more
> > likely that not only will we reach a successful conclusion, but that
> > we'll get there reasonably quickly.
> >
>
> All I can say is that I greatly appreciate the detailed help!
>
> Tomcat has been put back into BIO mode and has been restarted. I
> confirmed that the debug0.log file is now accumulating data.
>
> I am very eager to see what turns up in the logs Wednesday morning.
>
>
> > Thanks,
> >
> > Mark
> >

The custom patch restarted the debug0.log every few minutes and did not keep 
previous copies, so it took me several tries today to catch the problem in the 
act, but I finally did. I emailed you the pcaps and logs off list.

--Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-10-27 Thread Eric Robinson
> From: Mark Thomas 
> Sent: Tuesday, October 27, 2020 12:06 PM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 27/10/2020 16:29, Eric Robinson wrote:
> >> On 27/10/2020 15:22, Eric Robinson wrote:
>
> 
>
> >>> I had switched to the NIO connector last week. Is that why the logs
> >>> are still
> >> at 0 bytes?
> >>
> >> Yes. I only added the debug logging to BIO.
> >>
> >> Somewhere in a previous thread I recommended switching back to BIO as
> >> the code is simpler and therefore easier to debug.
> >>
> >> Note that my previous analysis of the access log valve entries was
> >> based on the assumption you had switched back to BIO.
> >>
> >> Given everything else is in place, I'd suggest switching back to BIO
> >> when you can, waiting for the issue to re-appear and then looking at the
> debug logs.
> >>
> >> Mark
> >>
> >>
> >
> > Oh man, I must have missed the recommendation to switch back to BIO. I
> will do that ASAP, most likely this evening. My apologies for causing the
> wasted effort.
>
> No worries.
>
> Looking at the NIO code, if you had switched back to BIO last night I'm fairly
> sure we'd now be adding the debug logging to BIO anyway so we'll end up in
> the same place.
>
> Now seems like a good time to point out that, as email based debugging
> sessions go, this is very much at the better end. Having someone as
> responsive as you are, who actually answers the questions asked and is
> willing to run custom patches to collect debug info makes it much more likely
> that not only will we reach a successful conclusion, but that we'll get there
> reasonably quickly.
>

All I can say is that I greatly appreciate the detailed help!

Tomcat has been put back into BIO mode and has been restarted. I confirmed that 
the debug0.log file is now accumulating data.

I am very eager to see what turns up in the logs Wednesday morning.


> Thanks,
>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Weirdest Tomcat Behavior Ever?

2020-10-27 Thread Eric Robinson
> On 27/10/2020 15:22, Eric Robinson wrote:
> >> On 27/10/2020 09:16, Mark Thomas wrote:
> >>> On 27/10/2020 04:43, Eric Robinson wrote:
> >>>
> >>> 
> >>>
> >>>>>>>> Any changes in the Nginx configuration in the relevant timescale?
> >>>>>>>>
> >>>>>>
> >>>>>> The last change to the nginx config files was on 8/21. The first
> >>>>>> report of problems from the users in question was on 9/16. There
> >>>>>> is another set of users on a different tomcat instance who
> >>>>>> reported issues around 8/26, 5 days after nginx config change. It
> >>>>>> seems unlikely to be related. Also, I can't imagine what nginx
> >>>>>> could be sending that would induce the upstream tomcat to behave
> this way.
> >>>
> >>> If there is some sort of retaining references to request/response/etc.
> >>> at the root of this then that sort of issue is very sensitive to timing.
> >>> For example, I have had reliable reproduction cases in the past that
> >>> stopped working with the addition of a single debug statement. Any
> >>> sort of configuration change might have changed the timing
> >>> sufficiently to trigger the issue
> >>>
> >>> At this point, I'd say the Nginx config change might be a potential
> >>> trigger if the root cause is retaining references.
> >>>
> >>>>>>>> Any updates to the application in the relevant timescale?
> >>>>>>>>
> >>>>>>
> >>>>>> Their application was patched to a newer version on 6/5.
> >>>
> >>> That seems far enough away to be unlikely.
> >>>
> >>>>>>>> Any features users started using that hadn't been used before
> >>>>>>>> in that timescale?
> >>>>>>
> >>>>>> That one I couldn't answer, as we are only the hosting facility
> >>>>>> and we are not in the loop when it comes to the users' workflow,
> >>>>>> but it seems unlikely given the nature of their business.
> >>>
> >>> Fair enough. That one was a bit of a shot in the dark.
> >>>
> >>> 
> >>>
> >>>>>> 1. Now that you have provided this patch, should I still enable
> >>>>>> RECYCLE_FACADES=true?
> >>>
> >>> I'd recommend yes. At least until the issue is resolved.
> >>>
> >>>>>> 2. The servers in question are multi-tenanted. There are 17
> >>>>>> instances of tomcat, each running on a different set of ports and
> >>>>>> out of a separate directory tree, but they are all started with
> >>>>>> essentially the same init script, which exports certain
> >>>>>> site-specific path variables and executes tomcat from the
> >>>>>> specified folder structure. Can you think of any potential issues
> >>>>>> where making this change for one instance could have a negative
> >>>>>> effect on any of the other instances? Probably not, but just
> >>>>>> being careful. I will have these changes implemented during our
> >>>>>> nightly maintenance window will begin to gather relevant logs
> >>>>> first thing tomorrow!
> >>>
> >>> I can't think of any side effects.
> >>>
> >>>>>>
> >>>>>> --Eric
> >>>>>
> >>>>> Mark, the changes have been made per your instructions and tomcat
> >>>>> has been restarted. The debug0.log, and debug0.log.lck files were
> >>>>> created in the directory, but they currently both have 0 bytes.
> >>>
> >>> Drat. That suggests something isn't quite right as the logs should
> >>> start filling up as soon as a single request is made. I'll double
> >>> check my instructions if you could double check your end.
> >>
> >> I've clarified a few things in the instructions and confirmed they
> >> work with my local 7.0.72 build.
> >>
> >> Note: You will need to be using the BIO connector
> >>
> >
> > I had switched to the NIO connector last week. Is that why the logs are 
> > still
> at 0 bytes?
>
> Yes. I only ad

RE: Weirdest Tomcat Behavior Ever?

2020-10-27 Thread Eric Robinson
> On 27/10/2020 09:16, Mark Thomas wrote:
> > On 27/10/2020 04:43, Eric Robinson wrote:
> >
> > 
> >
> >>>>>> Any changes in the Nginx configuration in the relevant timescale?
> >>>>>>
> >>>>
> >>>> The last change to the nginx config files was on 8/21. The first
> >>>> report of problems from the users in question was on 9/16. There is
> >>>> another set of users on a different tomcat instance who reported
> >>>> issues around 8/26, 5 days after nginx config change. It seems
> >>>> unlikely to be related. Also, I can't imagine what nginx could be
> >>>> sending that would induce the upstream tomcat to behave this way.
> >
> > If there is some sort of retaining references to request/response/etc.
> > at the root of this then that sort of issue is very sensitive to timing.
> > For example, I have had reliable reproduction cases in the past that
> > stopped working with the addition of a single debug statement. Any
> > sort of configuration change might have changed the timing
> > sufficiently to trigger the issue
> >
> > At this point, I'd say the Nginx config change might be a potential
> > trigger if the root cause is retaining references.
> >
> >>>>>> Any updates to the application in the relevant timescale?
> >>>>>>
> >>>>
> >>>> Their application was patched to a newer version on 6/5.
> >
> > That seems far enough away to be unlikely.
> >
> >>>>>> Any features users started using that hadn't been used before in
> >>>>>> that timescale?
> >>>>
> >>>> That one I couldn't answer, as we are only the hosting facility and
> >>>> we are not in the loop when it comes to the users' workflow, but it
> >>>> seems unlikely given the nature of their business.
> >
> > Fair enough. That one was a bit of a shot in the dark.
> >
> > 
> >
> >>>> 1. Now that you have provided this patch, should I still enable
> >>>> RECYCLE_FACADES=true?
> >
> > I'd recommend yes. At least until the issue is resolved.
> >
> >>>> 2. The servers in question are multi-tenanted. There are 17
> >>>> instances of tomcat, each running on a different set of ports and
> >>>> out of a separate directory tree, but they are all started with
> >>>> essentially the same init script, which exports certain
> >>>> site-specific path variables and executes tomcat from the specified
> >>>> folder structure. Can you think of any potential issues where
> >>>> making this change for one instance could have a negative effect on
> >>>> any of the other instances? Probably not, but just being careful. I
> >>>> will have these changes implemented during our nightly maintenance
> >>>> window will begin to gather relevant logs
> >>> first thing tomorrow!
> >
> > I can't think of any side effects.
> >
> >>>>
> >>>> --Eric
> >>>
> >>> Mark, the changes have been made per your instructions and tomcat
> >>> has been restarted. The debug0.log, and debug0.log.lck files were
> >>> created in the directory, but they currently both have 0 bytes.
> >
> > Drat. That suggests something isn't quite right as the logs should
> > start filling up as soon as a single request is made. I'll double
> > check my instructions if you could double check your end.
>
> I've clarified a few things in the instructions and confirmed they work with
> my local 7.0.72 build.
>
> Note: You will need to be using the BIO connector
>

I had switched to the NIO connector last week. Is that why the logs are still 
at 0 bytes?


> Mark
>
> >
> > Konstantin noted there was no source provided. I've pushed the branch
> > to
> > https://github.com/markt-asf/tomcat/tree/debug-7.0.72 so you can see
> > the changes I made.
> >
> >> Also, RECYCLE_FACADES has been enabled and I confirmed that it is
> referenced in the logs as follows...
> >>
> >> INFO: Command line argument:
> >> -Dorg.apache.catalina.connector.RECYCLE_FACADES=true
> >
> > Great.
> >
> > Mark
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
> >
>
>
&g

RE: Weirdest Tomcat Behavior Ever?

2020-10-26 Thread Eric Robinson
> -Original Message-
> From: Eric Robinson 
> Sent: Monday, October 26, 2020 11:37 PM
> To: Tomcat Users List 
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> > > On 26/10/2020 10:26, Mark Thomas wrote:
> > > > On 24/10/2020 01:32, Eric Robinson wrote:
> > > >
> > > > 
> > > >
> > > >>>> -Original Message-
> > > >>>> From: Mark Thomas 
> > > >
> > > > 
> > > >
> > > >>>> The failed request:
> > > >>>> - Completes in ~6ms
> > > >>>
> > > >>> I think we've seen the failed requests take as much as 50ms.
> > > >
> > > > Ack. That is still orders of magnitude smaller that the timeout
> > > > and consistent with generation time of some of the larger responses.
> > > >
> > > > I wouldn't sat it confirms any of my previous conclusions but it
> > > > doesn't invalidate them either.
> > > >
> > > >>>> Follow-up questions:
> > > >>>> - JVM
> > > >>>>   - Vendor?
> > > >>>>   - OS package or direct from Vendor?
> > > >>>>
> > > >>>
> > > >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> > > >
> > > > OK. That is post Java 8u202 so it should be a paid for,
> > > > commercially supported version of Java 8.
> > > >
> > > > The latest Java 8 release from Oracle is 8u271.
> > > >
> > > > The latest Java 8 release from AdoptOpenJDK is 8u272.
> > > >
> > > > I don't think we are quite at this point yet but what is your view
> > > > on updating to the latest Java 8 JDK (from either Oracle or
> > AdoptOpenJDK).
> > > >
> > > >>>> - Tomcat
> > > >>>>   - OS package, 3rd-party package or direct from ASF?
> > > >>>>
> > > >>>
> > > >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> > > >>>
> > > >>
> > > >> Drat, slight correction. I now recall that although we initially
> > > >> installed 7.0.76
> > > from the CentOS repo, the application vendor made us lower the
> > > version to 7.0.72, and I DO NOT know where we got that. However, it
> > > has not changed since October-ish of 2018.
> > > >
> > > > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > > > relevant changes.
> > > >
> > > >>>> - Config
> > > >>>>   - Any changes at all around the time the problems started? I'm
> > > >>>> thinking OS updates, VM restarted etc?
> > > >>>>
> > > >>>
> > > >>> server.xml has not changed since 4/20/20, which was well before
> > > >>> the problem manifested, and all the other files in the conf
> > > >>> folder are even older than that. We're seeing this symptom on
> > > >>> both production servers. One of them was rebooted a week ago,
> > > >>> but the other has been up continuously for
> > > >>> 258 days.
> > > >
> > > > OK. That rules a few things out which is good but it does make the
> > > > trigger for this issue even more mysterious.
> > > >
> > > > Any changes in the Nginx configuration in the relevant timescale?
> > > >
> >
> > The last change to the nginx config files was on 8/21. The first
> > report of problems from the users in question was on 9/16. There is
> > another set of users on a different tomcat instance who reported
> > issues around 8/26, 5 days after nginx config change. It seems
> > unlikely to be related. Also, I can't imagine what nginx could be
> > sending that would induce the upstream tomcat to behave this way.
> >
> > > > Any updates to the application in the relevant timescale?
> > > >
> >
> > Their application was patched to a newer version on 6/5.
> >
> > > > Any features users started using that hadn't been used before in
> > > > that timescale?
> >
> > That one I couldn't answer, as we are only the hosting facility and we
> > are not in the loop when it comes to the users' workflow, but it seems
> > unlikely given the nature of their business.
> >
> > > >
> > > > 
> &g

RE: Weirdest Tomcat Behavior Ever?

2020-10-26 Thread Eric Robinson
> > On 26/10/2020 10:26, Mark Thomas wrote:
> > > On 24/10/2020 01:32, Eric Robinson wrote:
> > >
> > > 
> > >
> > >>>> -Original Message-
> > >>>> From: Mark Thomas 
> > >
> > > 
> > >
> > >>>> The failed request:
> > >>>> - Completes in ~6ms
> > >>>
> > >>> I think we've seen the failed requests take as much as 50ms.
> > >
> > > Ack. That is still orders of magnitude smaller that the timeout and
> > > consistent with generation time of some of the larger responses.
> > >
> > > I wouldn't sat it confirms any of my previous conclusions but it
> > > doesn't invalidate them either.
> > >
> > >>>> Follow-up questions:
> > >>>> - JVM
> > >>>>   - Vendor?
> > >>>>   - OS package or direct from Vendor?
> > >>>>
> > >>>
> > >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> > >
> > > OK. That is post Java 8u202 so it should be a paid for, commercially
> > > supported version of Java 8.
> > >
> > > The latest Java 8 release from Oracle is 8u271.
> > >
> > > The latest Java 8 release from AdoptOpenJDK is 8u272.
> > >
> > > I don't think we are quite at this point yet but what is your view
> > > on updating to the latest Java 8 JDK (from either Oracle or
> AdoptOpenJDK).
> > >
> > >>>> - Tomcat
> > >>>>   - OS package, 3rd-party package or direct from ASF?
> > >>>>
> > >>>
> > >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> > >>>
> > >>
> > >> Drat, slight correction. I now recall that although we initially
> > >> installed 7.0.76
> > from the CentOS repo, the application vendor made us lower the version
> > to 7.0.72, and I DO NOT know where we got that. However, it has not
> > changed since October-ish of 2018.
> > >
> > > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > > relevant changes.
> > >
> > >>>> - Config
> > >>>>   - Any changes at all around the time the problems started? I'm
> > >>>> thinking OS updates, VM restarted etc?
> > >>>>
> > >>>
> > >>> server.xml has not changed since 4/20/20, which was well before
> > >>> the problem manifested, and all the other files in the conf folder
> > >>> are even older than that. We're seeing this symptom on both
> > >>> production servers. One of them was rebooted a week ago, but the
> > >>> other has been up continuously for
> > >>> 258 days.
> > >
> > > OK. That rules a few things out which is good but it does make the
> > > trigger for this issue even more mysterious.
> > >
> > > Any changes in the Nginx configuration in the relevant timescale?
> > >
>
> The last change to the nginx config files was on 8/21. The first report of
> problems from the users in question was on 9/16. There is another set of
> users on a different tomcat instance who reported issues around 8/26, 5 days
> after nginx config change. It seems unlikely to be related. Also, I can't
> imagine what nginx could be sending that would induce the upstream tomcat
> to behave this way.
>
> > > Any updates to the application in the relevant timescale?
> > >
>
> Their application was patched to a newer version on 6/5.
>
> > > Any features users started using that hadn't been used before in
> > > that timescale?
>
> That one I couldn't answer, as we are only the hosting facility and we are not
> in the loop when it comes to the users' workflow, but it seems unlikely given
> the nature of their business.
>
> > >
> > > 
> > >
> > >>>> Recommendations:
> > >>>> - Switch back to the BIO connector if you haven't already. It has fewer
> > >>>>   moving parts than NIO so it is simpler debug.
> > >>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
> > >>>>   record the number of body (excluding headers) bytes Tomcat
> > >>>> believes
> > it
> > >>>>   has written to the response.
> > >>>>
> > >>>>
> > >>>> Next steps:
> > >>>> - Wait for th

RE: Weirdest Tomcat Behavior Ever?

2020-10-26 Thread Eric Robinson
> On 26/10/2020 10:26, Mark Thomas wrote:
> > On 24/10/2020 01:32, Eric Robinson wrote:
> >
> > 
> >
> >>>> -Original Message-
> >>>> From: Mark Thomas 
> >
> > 
> >
> >>>> The failed request:
> >>>> - Completes in ~6ms
> >>>
> >>> I think we've seen the failed requests take as much as 50ms.
> >
> > Ack. That is still orders of magnitude smaller that the timeout and
> > consistent with generation time of some of the larger responses.
> >
> > I wouldn't sat it confirms any of my previous conclusions but it
> > doesn't invalidate them either.
> >
> >>>> Follow-up questions:
> >>>> - JVM
> >>>>   - Vendor?
> >>>>   - OS package or direct from Vendor?
> >>>>
> >>>
> >>> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
> >
> > OK. That is post Java 8u202 so it should be a paid for, commercially
> > supported version of Java 8.
> >
> > The latest Java 8 release from Oracle is 8u271.
> >
> > The latest Java 8 release from AdoptOpenJDK is 8u272.
> >
> > I don't think we are quite at this point yet but what is your view on
> > updating to the latest Java 8 JDK (from either Oracle or AdoptOpenJDK).
> >
> >>>> - Tomcat
> >>>>   - OS package, 3rd-party package or direct from ASF?
> >>>>
> >>>
> >>> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
> >>>
> >>
> >> Drat, slight correction. I now recall that although we initially installed 
> >> 7.0.76
> from the CentOS repo, the application vendor made us lower the version to
> 7.0.72, and I DO NOT know where we got that. However, it has not changed
> since October-ish of 2018.
> >
> > I've reviewed the 7.0.72 to 7.0.76 changelog and I don't see any
> > relevant changes.
> >
> >>>> - Config
> >>>>   - Any changes at all around the time the problems started? I'm
> >>>> thinking OS updates, VM restarted etc?
> >>>>
> >>>
> >>> server.xml has not changed since 4/20/20, which was well before the
> >>> problem manifested, and all the other files in the conf folder are
> >>> even older than that. We're seeing this symptom on both production
> >>> servers. One of them was rebooted a week ago, but the other has been
> >>> up continuously for
> >>> 258 days.
> >
> > OK. That rules a few things out which is good but it does make the
> > trigger for this issue even more mysterious.
> >
> > Any changes in the Nginx configuration in the relevant timescale?
> >

The last change to the nginx config files was on 8/21. The first report of 
problems from the users in question was on 9/16. There is another set of users 
on a different tomcat instance who reported issues around 8/26, 5 days after 
nginx config change. It seems unlikely to be related. Also, I can't imagine 
what nginx could be sending that would induce the upstream tomcat to behave 
this way.

> > Any updates to the application in the relevant timescale?
> >

Their application was patched to a newer version on 6/5.

> > Any features users started using that hadn't been used before in that
> > timescale?

That one I couldn't answer, as we are only the hosting facility and we are not 
in the loop when it comes to the users' workflow, but it seems unlikely given 
the nature of their business.

> >
> > 
> >
> >>>> Recommendations:
> >>>> - Switch back to the BIO connector if you haven't already. It has fewer
> >>>>   moving parts than NIO so it is simpler debug.
> >>>> - Add "%b" to the access log pattern for Tomcat's access log valve to
> >>>>   record the number of body (excluding headers) bytes Tomcat believes
> it
> >>>>   has written to the response.
> >>>>
> >>>>
> >>>> Next steps:
> >>>> - Wait for the issue to re-occur after the recommended changes above
> and
> >>>>   depending on what is recorded in the access log for %b for a failed
> >>>>   request, shift the focus accordingly.
> >>>> - Answers to the additional questions would be nice but the access log
> >>>>   %b value for a failed request is the key piece of information required
> >>>>   at this point.
> >>>>
> >>>
> >>> Good news! I enabled that parameter a few days 

RE: Weirdest Tomcat Behavior Ever?

2020-10-23 Thread Eric Robinson
> -Original Message-
> From: Eric Robinson 
> Sent: Friday, October 23, 2020 7:09 PM
> To: Tomcat Users List 
> Subject: RE: Weirdest Tomcat Behavior Ever?
>
> Hi Mark --
>
> Thanks tons for digging into this. See my answers below.
>
> > -Original Message-
> > From: Mark Thomas 
> > Sent: Friday, October 23, 2020 5:09 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > Hi Eric (and those following along),
> >
> > Eric sent me some network captures off-list from multiple points in
> > the network from the system where this is happening. The following is
> > the summary so far:
> >
> > Overview:
> > A small number of requests are receiving a completely empty (no
> > headers, no body) response.
> >
> >
> > Information Gathering:
> > Successful requests that are similar to the failed request:
> > - Take 7ms to 13ms to complete
> > - Have relatively small responses (~1k)
> > - Use HTTP/1.0
> > - Do not use keep-alive (request has Connection: close header)
> > - The request target is a JSP
> >
> > The failed request:
> > - Completes in ~6ms
>
> I think we've seen the failed requests take as much as 50ms.
>
> > - Has no response headers or body
> > - Records a successful entry in the access log
> >
> > System:
> > Tomcat 7.0.76, BIO HTTP connector
> > Java 1.8.0_221,
> > CentOS 7.5 server running in Azure
> >
> > Requests are received from an nginx reverse proxy. It looks like nginx
> > is using
> > HTTP/1.0 without keep-alive to proxy requests to Tomcat. This actually
> > makes things a little easier as we have one TCP/IP connection per request.
> >
> > After switching to NIO, the issue is still observable (info received
> > off-list along with access to network traces etc.).
> >
> > The issue appeared ~1 months ago after running without error since
> > October 2018. No known changes were made ~1 month ago.
> >
> > The TCP sequence numbers show that, as far as the network stack is
> > concerned, Tomcat did not write any data before closing the connection
> > cleanly.
> >
> > There is no other activity on the client port associated with the
> > failed request in the provided trace.
> >
> > The failed request does not appear to violate the HTTP specification.
> >
> > Neither the request nor the response are compressed.
> >
> > No WebSocket or other HTTP upgrade requests present in the network
> > traces.
> >
> > No obviously relevant bugs fixed since 7.0.76.
> >
> >
> > Follow-up questions:
> > - JVM
> >   - Vendor?
> >   - OS package or direct from Vendor?
> >
>
> jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.
>
>
> > - Tomcat
> >   - OS package, 3rd-party package or direct from ASF?
> >
>
> tomcat.noarch  7.0.76-6.el7 from CentOS base repository
>

Drat, slight correction. I now recall that although we initially installed 
7.0.76 from the CentOS repo, the application vendor made us lower the version 
to 7.0.72, and I DO NOT know where we got that. However, it has not changed 
since October-ish of 2018.

> > - Config
> >   - Any changes at all around the time the problems started? I'm
> > thinking OS updates, VM restarted etc?
> >
>
> server.xml has not changed since 4/20/20, which was well before the
> problem manifested, and all the other files in the conf folder are even older
> than that. We're seeing this symptom on both production servers. One of
> them was rebooted a week ago, but the other has been up continuously for
> 258 days.
>
> >
> > Conclusions:
> > - It isn't timeout related. The request is completing in orders of
> >   magnitude less time than the timeout.
> >
> > - Based on the timings it looks like the JSP is processing the request
> >   and generating the response.
> >
> > - It happens with BIO so sendfile isn't a factor.
> >
> > - No compression so GZIP issues aren't a factor.
> >
> > - Given that the issue occurs with both BIO and NIO that rules out a bug
> >   in any BIO or NIO specific code. Note that while 7.0.x has largely
> >   separate code for BIO and NIO there are still significant sections of
> >   code that are essentially identical so it isn't quite as simple as
> >   just ruling out all the code in the BIO and NIO specific classes.
> >   It also makes a JVM issue seem less likely at this point.
> >
> >
> > Current thinking:
> > - I can think o

RE: Weirdest Tomcat Behavior Ever?

2020-10-23 Thread Eric Robinson
Hi Mark --

Thanks tons for digging into this. See my answers below.

> -Original Message-
> From: Mark Thomas 
> Sent: Friday, October 23, 2020 5:09 PM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Hi Eric (and those following along),
>
> Eric sent me some network captures off-list from multiple points in the
> network from the system where this is happening. The following is the
> summary so far:
>
> Overview:
> A small number of requests are receiving a completely empty (no headers,
> no body) response.
>
>
> Information Gathering:
> Successful requests that are similar to the failed request:
> - Take 7ms to 13ms to complete
> - Have relatively small responses (~1k)
> - Use HTTP/1.0
> - Do not use keep-alive (request has Connection: close header)
> - The request target is a JSP
>
> The failed request:
> - Completes in ~6ms

I think we've seen the failed requests take as much as 50ms.

> - Has no response headers or body
> - Records a successful entry in the access log
>
> System:
> Tomcat 7.0.76, BIO HTTP connector
> Java 1.8.0_221,
> CentOS 7.5 server running in Azure
>
> Requests are received from an nginx reverse proxy. It looks like nginx is 
> using
> HTTP/1.0 without keep-alive to proxy requests to Tomcat. This actually
> makes things a little easier as we have one TCP/IP connection per request.
>
> After switching to NIO, the issue is still observable (info received off-list 
> along
> with access to network traces etc.).
>
> The issue appeared ~1 months ago after running without error since October
> 2018. No known changes were made ~1 month ago.
>
> The TCP sequence numbers show that, as far as the network stack is
> concerned, Tomcat did not write any data before closing the connection
> cleanly.
>
> There is no other activity on the client port associated with the failed 
> request
> in the provided trace.
>
> The failed request does not appear to violate the HTTP specification.
>
> Neither the request nor the response are compressed.
>
> No WebSocket or other HTTP upgrade requests present in the network
> traces.
>
> No obviously relevant bugs fixed since 7.0.76.
>
>
> Follow-up questions:
> - JVM
>   - Vendor?
>   - OS package or direct from Vendor?
>

jdk-8u221-linux-x64.tar.gz downloaded from the Oracle web site.


> - Tomcat
>   - OS package, 3rd-party package or direct from ASF?
>

tomcat.noarch  7.0.76-6.el7 from CentOS base repository

> - Config
>   - Any changes at all around the time the problems started? I'm
> thinking OS updates, VM restarted etc?
>

server.xml has not changed since 4/20/20, which was well before the problem 
manifested, and all the other files in the conf folder are even older than 
that. We're seeing this symptom on both production servers. One of them was 
rebooted a week ago, but the other has been up continuously for 258 days.

>
> Conclusions:
> - It isn't timeout related. The request is completing in orders of
>   magnitude less time than the timeout.
>
> - Based on the timings it looks like the JSP is processing the request
>   and generating the response.
>
> - It happens with BIO so sendfile isn't a factor.
>
> - No compression so GZIP issues aren't a factor.
>
> - Given that the issue occurs with both BIO and NIO that rules out a bug
>   in any BIO or NIO specific code. Note that while 7.0.x has largely
>   separate code for BIO and NIO there are still significant sections of
>   code that are essentially identical so it isn't quite as simple as
>   just ruling out all the code in the BIO and NIO specific classes.
>   It also makes a JVM issue seem less likely at this point.
>
>
> Current thinking:
> - I can think of various ways this might be happening but they all seem
>   pretty unlikely. The next steps will be to enable existing logging
>   (and potentially add some custom logging) to try and narrow down where
>   the response data is disappearing.
>
> - Having reviewed the BIO code, there is a mercifully simple way to see
>   how many bytes Tomcat thinks it has written to the response. The "%b"
>   pattern in the access log valve will show how many bytes from the
>   request body that Tomcat has written to the network socket without an
>   IOException. I'd prefer something that recorded header bytes as well
>   but this is a good first start and doesn't require custom patches.
>
>
> Recommendations:
> - Switch back to the BIO connector if you haven't already. It has fewer
>   moving parts than NIO so it is simpler debug.
> - Add "%b" to the access log pattern for Tomcat's access log valve to
>   record the number of body (excluding headers) bytes Tomcat believes it
>   has written to the response.
>
>
> Next steps:
> - Wait for the issue to re-occur after the recommended changes above and
>   depending on what is recorded in the access log for %b for a failed
>   request, shift the focus accordingly.
> - Answers to the additional questions would be nice but the access log
>   %b value for a failed 

RE: Weirdest Tomcat Behavior Ever?

2020-10-16 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Friday, October 16, 2020 8:02 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 12:37, Eric Robinson wrote:
> >> From: Mark Thomas 
>
> 
>
> >> I'd like to see those screen shots please. Better still would be
> >> access to the captures themselves (just the relevant connections not
> >> the whole thing). I believe what you are telling us but long
> >> experience tells me it is best to double check the original data as well.
> >>
> >
> > I'll send you a link to the screen shot first, then I'll package up the 
> > captures
> and send a link to that in a bit. As the files may contain somewhat sensitive
> information, I'll send a secure mail direct to your inbox.
>
> Thanks. The screenshots didn't shed any light on this so far.
>
> >> I have observed something similar ish in the CI systems. In that case
> >> it is the requests that disappear. Client side logging shows the
> >> request was made but there is no sign of it ever being received by
> >> Tomcat. I don't have network traces for that (yet) so I'm not sure where
> the data is going missing.
> >>
> >> I am beginning to suspect there is a hard to trigger Tomcat or JVM
> >> bug here. I think a Tomcat bug is more likely although I have been
> >> over the code several times and I don't see anything.
> >>
> >
> > I'm thinking a bug of some kind, too, but I've been hosting about 1800
> instances of tomcat for 15 years and I have never seen this behavior before.
> >
> >> A few more questions:
> >>
> >
> > This is where I will begin to struggle bit.
> >
> >> Which HTTP connector are you using? BIO, NIO or APR/Native?
> >>
> >
> > I believe BIO is the default? server.xml just says...
> >
> >  >connectionTimeout="2"
> >redirectPort="8443" />
>
> That will be BIO or APR/Native depending on whether you have Tomcat
> Native installed. If you look at the logs for when Tomcat starts you should 
> see
> something like:
>
> INFO: Initializing ProtocolHandler ["http-bio-3016"] or
> INFO: Initializing ProtocolHandler ["http-apr-3016"]
>
> What do you see between the square brackets?

["http-bio-3016"]

>
> >> Is the issue reproducible if you switch to a different connector?
> >>
> >
> > In 15 years of using tomcat in production, we've never tried switching the
> connector type. (Probably because the app vendor never suggested it.) I did
> a little research and I'm beginning to think about the pros/cons.
>
> If you wanted to try this, I'd recommend:
>
> protocol="org.apache.coyote.http11.Http11NioProtocol"
>

We're in the middle of a production day so I want to avoid restarting tomcat if 
I can, but I'll plan to change that tonight.

> >> How easy is it for you to reproduce this issue?
> >>
> >
> > It's not reproducible at will but it happens frequently enough that we don't
> have to wait long for it to happen. I have wireshark capturing to disk
> continuously and rotating the files at 10 minute intervals to keep them
> smallish. Then I just tail the logs and wait.
>
> Ack.
>
> >> How are you linking the request you see in the access log with the
> >> request you see in Wireshark?
> >
> > Aside from the timestamp of the packets and the timestamp of the tomcat
> log messages, each HTTP request also contains a high-resolution timestamp
> and a unique random number. That way, even if the same request occurs
> multiple times in rapid succession, we can still isolate the exact one that
> failed.
>
> Excellent.
>
> >> How comfortable are you running a patched version of Tomcat (drop
> >> class files provided by me into $CATALINA_BASE/lib in the right
> >> directory structure and restart Tomcat)? Just thinking ahead about
> >> collecting additional debug information.
> >
> > That would be a tricky in our production environment, but the users are
> getting desperate enough that we'd be willing to explore that approach.
>
> Understood.
>
> Some other questions that have come to mind:
>
> - Has this app always had this problem?
>

No, it's been running fine in this environment since October 2018.

> - If not, when did it start and what changed at that point (JVM version,
> Tomcat version etc)
>

This is a new thing in the past month or so, but we can't think of what might 
have changed. There are 2 Linux tomc

RE: Weirdest Tomcat Behavior Ever?

2020-10-16 Thread Eric Robinson
> -Original Message-
> From: Mark Thomas 
> Sent: Friday, October 16, 2020 5:17 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 10:05, Eric Robinson wrote:
> > Hi Mark --
> >
> > Those are great questions. See answers below.
> >
> >
> >> -Original Message-
> >> From: Mark Thomas 
> >> Sent: Friday, October 16, 2020 2:20 AM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Weirdest Tomcat Behavior Ever?
> >>
> >> On 16/10/2020 00:27, Eric Robinson wrote:
> >>
> >> 
> >>
> >>> The localhost_access log shows a request received and an HTTP 200
> >> response sent, as follows...
> >>>
> >>> 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> >>> /app/code.jsp?gizmoid=64438=5=2020-10-
> >> 15
> >>>
> >>
> lterId=0=0=71340=321072
> >> e
> >>> ssid=40696=0.0715816=15102020125245.789063
> HTTP/1.0
> >>> ?gizmoid=64438=5=2020-10-
> >> 15=0
> >>>
> >>
> ionDID=0=71340=321072=40696&
> >> rn
> >>> d2=0.0715816=15102020125245.789063 200
> >>>
> >>> But WireShark shows what really happened. The server received the
> >>> GET
> >> request, and then it sent a FIN to terminate the connection. So if
> >> tomcat sent an HTTP response, it did not make it out the Ethernet card.
> >>>
> >>> Is this the weirdest thing or what? Ideas would sure be appreciated!
> >>
> >> I am assuming there is a typo in your Java version and you are using Java 
> >> 8.
> >>
> >
> > Yes, Java 8.
> >
> >> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in
> >> less than 6 months). If you aren't already planning to upgrade (I'd
> >> suggest to 9.0.x) then you might want to start thinking about it.
> >>
> >
> > Vendor constraint. It's a canned application published by a national
> software company, and they have not officially approved tomcat 8 for use on
> Linux yet.
> >
> >> I have a few ideas about what might be going on but rather than fire
> >> out random theories I have some questions that might help narrow things
> down.
> >>
> >> 1. If this request was successful, how big is the response?
> >>
> >
> > 1035 bytes.
> >
> >> 2. If this request was successful, how long would it typically take
> >> to complete?
> >>
> >
> > Under 60 ms.
> >
> >> 3. Looking at the Wireshark trace for a failed request, how long
> >> after the last byte of the request is sent by the client does Tomcat send
> the FIN?
> >>
> >
> > Maybe 100 microseconds.
> >
> >> 4. Looking at the Wireshark trace for a failed request, is the
> >> request fully sent (including terminating CRLF etc)?
> >>
> >
> > Yes, the request as seen by the tomcat server is complete and is
> terminated by 0D 0A.
> >
> >> 5. Are there any proxies, firewalls etc between the user agent and
> Tomcat?
> >>
> >
> > User agent -> firewall -> nginx plus -> upstream tomcat servers
> >
> >> 6. What timeouts are configured for the Connector?
> >>
> >
> > Sorry, which connector are you referring to?
> >
> >> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
> >>
> >
> > HTTP/1.1
> >
> >> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> >> else?
> >
> > On the nginx proxy and both upstream tomcat servers. (On the user
> > agent, too, but that doesn't help us in this case.)
> >
> > If you would like to see a screen shot showing all 4 captures side-by-size, 
> > I
> can send you a secure link. It will verify my answers above. It shows 4
> separate WireShark captures taken simultaneously:
> >
> > (a) the request going from the nginx proxy to tomcat 1
> > (b) tomcat 1 receiving the request and terminating the connection
> > (c) nginx sending the request to tomcat 2
> > (d) tomcat 2 replying to the request (but the reply does not help the user
> because the tomcat server does not recognize the user agent's JSESSIONID
> cookie, so it responds "invalid session."
>
> Hmm.
>
> That rules out most of my ideas.
>
> I'd like to see those screen shots please. Better still would be access to the
> captures themselves (just the relevant connections not the whole thing). I

RE: Weirdest Tomcat Behavior Ever?

2020-10-16 Thread Eric Robinson
> > 6. What timeouts are configured for the Connector?
> >
>
> Sorry, which connector are you referring to?
>

Sorry, I'm a dummy. Obviously you mean the tomcat connector.

connectionTimeout="2"

-Eric
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Weirdest Tomcat Behavior Ever?

2020-10-16 Thread Eric Robinson
Hi Mark --

Those are great questions. See answers below.


> -Original Message-
> From: Mark Thomas 
> Sent: Friday, October 16, 2020 2:20 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 00:27, Eric Robinson wrote:
>
> 
>
> > The localhost_access log shows a request received and an HTTP 200
> response sent, as follows...
> >
> > 10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET
> > /app/code.jsp?gizmoid=64438=5=2020-10-
> 15
> >
> lterId=0=0=71340=321072
> e
> > ssid=40696=0.0715816=15102020125245.789063 HTTP/1.0
> > ?gizmoid=64438=5=2020-10-
> 15=0
> >
> ionDID=0=71340=321072=40696&
> rn
> > d2=0.0715816=15102020125245.789063 200
> >
> > But WireShark shows what really happened. The server received the GET
> request, and then it sent a FIN to terminate the connection. So if tomcat sent
> an HTTP response, it did not make it out the Ethernet card.
> >
> > Is this the weirdest thing or what? Ideas would sure be appreciated!
>
> I am assuming there is a typo in your Java version and you are using Java 8.
>

Yes, Java 8.

> That Tomcat version is over 3.5 years old (and Tomcat 7 is EOL in less than 6
> months). If you aren't already planning to upgrade (I'd suggest to 9.0.x) then
> you might want to start thinking about it.
>

Vendor constraint. It's a canned application published by a national software 
company, and they have not officially approved tomcat 8 for use on Linux yet.

> I have a few ideas about what might be going on but rather than fire out
> random theories I have some questions that might help narrow things down.
>
> 1. If this request was successful, how big is the response?
>

1035 bytes.

> 2. If this request was successful, how long would it typically take to
> complete?
>

Under 60 ms.

> 3. Looking at the Wireshark trace for a failed request, how long after the 
> last
> byte of the request is sent by the client does Tomcat send the FIN?
>

Maybe 100 microseconds.

> 4. Looking at the Wireshark trace for a failed request, is the request fully 
> sent
> (including terminating CRLF etc)?
>

Yes, the request as seen by the tomcat server is complete and is terminated by 
0D 0A.

> 5. Are there any proxies, firewalls etc between the user agent and Tomcat?
>

User agent -> firewall -> nginx plus -> upstream tomcat servers

> 6. What timeouts are configured for the Connector?
>

Sorry, which connector are you referring to?

> 7. Is this HTTP/1.1, HTTP/2, AJP, with or without TLS?
>

HTTP/1.1

> 8. Where are you running Wireshark? User agent? Tomcat? Somewhere
> else?

On the nginx proxy and both upstream tomcat servers. (On the user agent, too, 
but that doesn't help us in this case.)

If you would like to see a screen shot showing all 4 captures side-by-size, I 
can send you a secure link. It will verify my answers above. It shows 4 
separate WireShark captures taken simultaneously:

(a) the request going from the nginx proxy to tomcat 1
(b) tomcat 1 receiving the request and terminating the connection
(c) nginx sending the request to tomcat 2
(d) tomcat 2 replying to the request (but the reply does not help the user 
because the tomcat server does not recognize the user agent's JSESSIONID 
cookie, so it responds "invalid session."

>
> Mark
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Weirdest Tomcat Behavior Ever?

2020-10-15 Thread Eric Robinson
Has anyone ever seen a situation where tomcat occasionally fails to send 
responses but still logs them?

On a CentOS 7.5 server running in Azure with tomcat 7.0.76 with java 1.0.0_221, 
everything runs fine 99.99% of the time, but that last hundredth of a percent 
is a bitch. However, intermittently the server receives a request and then just 
terminates the connection without responding. But the localhost_access log 
shows that it DID respond.

Let me say that again.

The localhost_access log shows a request received and an HTTP 200 response 
sent, as follows...

10.51.14.133 [15/Oct/2020:12:52:45 -0400] 57 GET 
/app/code.jsp?gizmoid=64438=5=2020-10-15=0=0=71340=321072=40696=0.0715816=15102020125245.789063
 HTTP/1.0 
?gizmoid=64438=5=2020-10-15=0=0=71340=321072=40696=0.0715816=15102020125245.789063
 200

But WireShark shows what really happened. The server received the GET request, 
and then it sent a FIN to terminate the connection. So if tomcat sent an HTTP 
response, it did not make it out the Ethernet card.

Is this the weirdest thing or what? Ideas would sure be appreciated!

-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Tomcat Processing Timer Question

2020-09-13 Thread Eric Robinson
We use LVS+ldirectord. It does not provide the kind of logs you're referring to.

> -Original Message-
> From: Martin Grigorov 
> Sent: Saturday, September 12, 2020 12:03 AM
> To: Tomcat Users List 
> Subject: Re: Tomcat Processing Timer Question
>
> Hi,
>
> On Sat, Sep 12, 2020, 02:57 Eric Robinson 
> wrote:
>
> > I'm not sure what you mean by measuring at the load balancer level.
> > We're using the jasper logs and those only exist on the tomcat server
> > itself. I must be misunderstanding your meaning.
> >
>
> He meant to use the LB's logs for the same.
> What software do you use for load balancing?
>
>
> > Get Outlook for Android<https://aka.ms/ghei36>
> >
> > 
> > From: Christopher Schultz 
> > Sent: Thursday, September 10, 2020 3:11:43 PM
> > To: users@tomcat.apache.org 
> > Subject: Re: Tomcat Processing Timer Question
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA256
> >
> > Eric,
> >
> > On 9/10/20 15:29, Eric Robinson wrote:
> > > Chris --
> > >
> > >
> > >> You should also look at worker-thread availability. When you see
> > >> these "high latency" (which is usually a term reserved for I/O
> > >> characterization) events, do you have:>> 1. Available worker
> > >> threads (from the executor thread pool) 2. Any other shared/limited
> > >> resource (e.g. DB connection pool)
> > >>
> > >
> > > Good thought. I should mention that the hosted application is
> > > canned, and is the same for all tomcat instances, with only minor
> > > variations in version between them. User workflow is also similar.
> > > Over the years we've developed a good feel for expected performance
> > > and resource utilization based on the user count per instance. So
> > > when one instance exhibits anomalous performance, we tend to go
> > > right to networking issues.
> > >
> > >> Also, are you seeing the otherwise unexpected slowness on each
> > >> Tomcat node, or are you seeing it at the load-balancer/multiplexer
> > >> level?
> > >>
> > >
> > > We run multi-tenanted servers, with many instances of tomcat on each
> > > server. We've never seen issues at the load-balancer.
> >
> > What I mean is, are you measuring the request at the Tomcat level, or
> > at the load-balancer level? If you are watching at the lb, then your
> > lb might pick a "busy" Tomcat and the request has to "wait in line"
> > before processing even begins. If you sample at the Tomcat level,
> > you'll see no discernible slowdown because the time "waiting in line" does
> not count.
> >
> > > Very occasionally, there might be a problem at the server level.
> > > When that happens, all instances on that server may become sluggish.
> > > What I'm talking about in this thread are cases where only one
> > > instance on a server is showing slowness in its jasper logs. Also,
> > > we typically do not see the same slowness when we test the
> > > application locally from the same network. I've had my eye on TCP
> > > retransmits as a possible culprit for a while, but I just didn't
> > > know for sure if my understanding of the tomcat processing timer is
> > > correct.
> > I hope we've cleared that up for you, then.
> >
> > You might also want to read about "buffer bloat" if you aren't already
> > familiar with that term.
> >
> > - -chris
> > -BEGIN PGP SIGNATURE-
> > Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
> >
> >
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9aiH4ACgkQHPApP6
> U8
> >
> pFiFfBAAvuUbRXK+iDDy7lLsw6eplMFrXXDbkxzwtSNafvdGlDWmcPWwdazZw
> hQ+
> >
> TJ0pzkUwf3/RBslu/oORJYelYKhpUJLodj0Y85ZtbuKBcU2JpKk1uueJ/aqnmVFK
> >
> 9yep3ReYdggEXQ3JNb1VeI4ASdEhFWoFw8pc6DAcJZ4K2JaUtGKrtoWG8n+oE
> Xos
> >
> kmthl9Dm9ge3edLimd7TPTx11iODi6pX3ddJ+uRh7qmvXZp4wVyX8W+hkKiOh
> UQM
> >
> hokUd8RruXQm6wut5m+JSO6eLHqkKUBiLspzlz1x/Y4cuaqAlC8Pl5y9NFTuLK3
> e
> >
> gFJeDmBUthN2y5h9KNKW5r+Gf9bKpuv1+kw7CIaNoFv2JxCGTmfL3VKM+Bp/
> rh7J
> >
> 1SbshsTW6ffo5hKRNJUJKvxry3uUvzrss0AYe338tJ1QA+sHuXHsN8ZVtY3b+51
> O
> >
> HBOYf3pgIPsSd6zXkjaSRoOAhVc9G5sbJHx8ycQt+yAyVvXEUwrqeeRbsJeADk
> 2s
> >
> reaizm9WvO2kHSqP93ANNYe1QJ+rw9b5og0uoCE8x9eO+czRHbJ7LFF6/rvX+6
> Pn
> > TIYB7AHyV8P3PHpHtBGIgaNfnvIYbqx/hzxJpLlpNEcS2zARfi1YCnuNtbiH0KU/
> > AKk

Re: Tomcat Processing Timer Question

2020-09-11 Thread Eric Robinson
I'm not sure what you mean by measuring at the load balancer level. We're using 
the jasper logs and those only exist on the tomcat server itself. I must be 
misunderstanding your meaning.

Get Outlook for Android<https://aka.ms/ghei36>


From: Christopher Schultz 
Sent: Thursday, September 10, 2020 3:11:43 PM
To: users@tomcat.apache.org 
Subject: Re: Tomcat Processing Timer Question

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Eric,

On 9/10/20 15:29, Eric Robinson wrote:
> Chris --
>
>
>> You should also look at worker-thread availability. When you see
>>  these "high latency" (which is usually a term reserved for I/O
>> characterization) events, do you have:>> 1. Available worker
>> threads (from the executor thread pool) 2. Any other
>> shared/limited resource (e.g. DB connection pool)
>>
>
> Good thought. I should mention that the hosted application is
> canned, and is the same for all tomcat instances, with only minor
> variations in version between them. User workflow is also similar.
> Over the years we've developed a good feel for expected
> performance and resource utilization based on the user count per
> instance. So when one instance exhibits anomalous performance, we
> tend to go right to networking issues.
>
>> Also, are you seeing the otherwise unexpected slowness on each
>> Tomcat node, or are you seeing it at the
>> load-balancer/multiplexer level?
>>
>
> We run multi-tenanted servers, with many instances of tomcat on
> each server. We've never seen issues at the load-balancer.

What I mean is, are you measuring the request at the Tomcat level, or at
the load-balancer level? If you are watching at the lb, then your lb
might pick a "busy" Tomcat and the request has to "wait in line" before
processing even begins. If you sample at the Tomcat level, you'll see no
discernible slowdown because the time "waiting in line" does not count.

> Very occasionally, there might be a problem at the server level.
> When that happens, all instances on that server may become
> sluggish. What I'm talking about in this thread are cases where
> only one instance on a server is showing slowness in its jasper
> logs. Also, we typically do not see the same slowness when we test
> the application locally from the same network. I've had my eye on
> TCP retransmits as a possible culprit for a while, but I just
> didn't know for sure if my understanding of the tomcat processing
> timer is correct.
I hope we've cleared that up for you, then.

You might also want to read about "buffer bloat" if you aren't already
familiar with that term.

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9aiH4ACgkQHPApP6U8
pFiFfBAAvuUbRXK+iDDy7lLsw6eplMFrXXDbkxzwtSNafvdGlDWmcPWwdazZwhQ+
TJ0pzkUwf3/RBslu/oORJYelYKhpUJLodj0Y85ZtbuKBcU2JpKk1uueJ/aqnmVFK
9yep3ReYdggEXQ3JNb1VeI4ASdEhFWoFw8pc6DAcJZ4K2JaUtGKrtoWG8n+oEXos
kmthl9Dm9ge3edLimd7TPTx11iODi6pX3ddJ+uRh7qmvXZp4wVyX8W+hkKiOhUQM
hokUd8RruXQm6wut5m+JSO6eLHqkKUBiLspzlz1x/Y4cuaqAlC8Pl5y9NFTuLK3e
gFJeDmBUthN2y5h9KNKW5r+Gf9bKpuv1+kw7CIaNoFv2JxCGTmfL3VKM+Bp/rh7J
1SbshsTW6ffo5hKRNJUJKvxry3uUvzrss0AYe338tJ1QA+sHuXHsN8ZVtY3b+51O
HBOYf3pgIPsSd6zXkjaSRoOAhVc9G5sbJHx8ycQt+yAyVvXEUwrqeeRbsJeADk2s
reaizm9WvO2kHSqP93ANNYe1QJ+rw9b5og0uoCE8x9eO+czRHbJ7LFF6/rvX+6Pn
TIYB7AHyV8P3PHpHtBGIgaNfnvIYbqx/hzxJpLlpNEcS2zARfi1YCnuNtbiH0KU/
AKkBx5FnZvwclCA3qK2oqBnSEcBUFz2yobq4wAy//qwgL2gEFNc=
=mcpm
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Tomcat Processing Timer Question

2020-09-10 Thread Eric Robinson
Chris --


> You should also look at worker-thread availability. When you see these "high
> latency" (which is usually a term reserved for I/O
> characterization) events, do you have:
>
> 1. Available worker threads (from the executor thread pool)
> 2. Any other shared/limited resource (e.g. DB connection pool)
>

Good thought. I should mention that the hosted application is canned, and is 
the same for all tomcat instances, with only minor variations in version 
between them. User workflow is also similar. Over the years we've developed a 
good feel for expected performance and resource utilization based on the user 
count per instance. So when one instance exhibits anomalous performance, we 
tend to go right to networking issues.

> Also, are you seeing the otherwise unexpected slowness on each Tomcat
> node, or are you seeing it at the load-balancer/multiplexer level?
>

We run multi-tenanted servers, with many instances of tomcat on each server. 
We've never seen issues at the load-balancer. Very occasionally, there might be 
a problem at the server level. When that happens, all instances on that server 
may become sluggish. What I'm talking about in this thread are cases where only 
one instance on a server is showing slowness in its jasper logs. Also, we 
typically do not see the same slowness when we test the application locally 
from the same network. I've had my eye on TCP retransmits as a possible culprit 
for a while, but I just didn't know for sure if my understanding of the tomcat 
processing timer is correct.

--Eric



> -Original Message-
> From: Christopher Schultz 
> Sent: Thursday, September 10, 2020 8:24 AM
> To: users@tomcat.apache.org
> Subject: Re: Tomcat Processing Timer Question
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Eric,
>
> On 9/9/20 20:42, Eric Robinson wrote:
> > Hi Chris --
> >
> >> Are you have any specific problem you are trying to diagnose or fix?
> >> Or are you just academically interested in what conditions might
> cause "slow"
> >> request processing?
> >
> > A little of both. We've been running about 1500 instances of tomcat
> > for the past 15 years. We're not tomcat experts by any means, but
> > we're always looking to refine our understanding of tomcat
> > performance. Like many people, we have custom scripts (ours are in
> > python) that parse the jasper logs and produce a report that
> > summarizes responsiveness and helps us isolate underperforming tomcat
> > instances and JSP calls. Occasionally, we see evidence of chronic high
> > latency in processing time when there is no indication of bottlenecks
> > or problems in the servers themselves or the database back-ends. We
> > theorize that client connectivity could be responsible.
> That is a reasonable conclusion.
>
> You should also look at worker-thread availability. When you see these "high
> latency" (which is usually a term reserved for I/O
> characterization) events, do you have:
>
> 1. Available worker threads (from the executor thread pool) 2. Any other
> shared/limited resource (e.g. DB connection pool)
>
> Also, are you seeing the otherwise unexpected slowness on each Tomcat
> node, or are you seeing it at the load-balancer/multiplexer level?
>
> - -chris
>
> >> -Original Message- From: Christopher Schultz
> >>  Sent: Wednesday, September 9, 2020
> >> 7:41 AM To: users@tomcat.apache.org Subject: Re: Tomcat Processing
> >> Timer Question
> >>
> > Eric,
> >
> > On 9/8/20 17:29, Eric Robinson wrote:
> >>>> Got it. So TCP retransmits can impact tomcat processing time under
> >>>> certain conditions, more likely due to issues with receiving
> >>>> requests from the client than sending responses.
> > Well... buffering can happen either during the client-write phase or
> > the server-read phase or both.
> >
> > Imagine a slow network like EDGE or something similar where the first
> > byte arrives at Tomcat's poller and it handed-off to the
> > request-processor (t=0 as far as Tomcat is concerned) and uploads a
> > large image over that EDGE connection. The OS won't allocate an
> > infinite input buffer, so at some point the Poller will get byte 0
> > when the client hasn't uploaded the complete request. It may still
> > take several seconds to upload all those bytes.
> >
> > Imagine that the response is a transformed image so the response is
> > also large. The OS won't allocate an infinite output buffer, so at
> > some point the bytes will start streaming to the client (at a slow
> > rate). When the output buffer fills, yo

RE: Tomcat Processing Timer Question

2020-09-09 Thread Eric Robinson
Hi Chris --

> Are you have any specific problem you are trying to diagnose or fix?
> Or are you just academically interested in what conditions might cause "slow"
> request processing?

A little of both. We've been running about 1500 instances of tomcat for the 
past 15 years. We're not tomcat experts by any means, but we're always looking 
to refine our understanding of tomcat performance. Like many people, we have 
custom scripts (ours are in python) that parse the jasper logs and produce a 
report that summarizes responsiveness and helps us isolate underperforming 
tomcat instances and JSP calls. Occasionally, we see evidence of chronic high 
latency in processing time when there is no indication of bottlenecks or 
problems in the servers themselves or the database back-ends. We theorize that 
client connectivity could be responsible.


> -Original Message-
> From: Christopher Schultz 
> Sent: Wednesday, September 9, 2020 7:41 AM
> To: users@tomcat.apache.org
> Subject: Re: Tomcat Processing Timer Question
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Eric,
>
> On 9/8/20 17:29, Eric Robinson wrote:
> > Got it. So TCP retransmits can impact tomcat processing time under
> > certain conditions, more likely due to issues with receiving requests
> > from the client than sending responses.
> Well... buffering can happen either during the client-write phase or the
> server-read phase or both.
>
> Imagine a slow network like EDGE or something similar where the first byte
> arrives at Tomcat's poller and it handed-off to the request-processor (t=0 as
> far as Tomcat is concerned) and uploads a large image over that EDGE
> connection. The OS won't allocate an infinite input buffer, so at some point
> the Poller will get byte 0 when the client hasn't uploaded the complete
> request. It may still take several seconds to upload all those bytes.
>
> Imagine that the response is a transformed image so the response is also
> large. The OS won't allocate an infinite output buffer, so at some point the
> bytes will start streaming to the client (at a slow rate). When the output
> buffer fills, your request-processing thread will stall when calling
> ServletOutputStream.write() to write those image bytes.
>
> If your image transform is instantaneous, your access log will report that the
> request took "a long time" relative to the amount of time spent actually
> processing the request. Basically, you are just waiting on I/O the entire 
> time.
>
> Are you have any specific problem you are trying to diagnose or fix?
> Or are you just academically interested in what conditions might cause "slow"
> request processing?
>
> Hope that helps,
> - -chris
>
> >> -Original Message- From: Mark Thomas 
> >> Sent: Tuesday, September 8, 2020 4:05 PM To:
> >> users@tomcat.apache.org Subject: Re: Tomcat Processing Timer Question
> >>
> >> On 08/09/2020 21:46, Eric Robinson wrote:
> >>> Hi Mark --
> >>>
> >>> "If the request is split across multiple packets the timer starts
> >>> when Tomcat
> >> reads the first byte of the request from the first packet.
> >>> Tomcat stops the timer on a request after the last byte of the
> >>> response has
> >> been accepted by the network stack."
> >>>
> >>> Now we're getting somewhere. If tomcat starts its timer when it
> >>> reads the
> >> first byte of the client's request, and the request is split into
> >> multiple packets, doesn't it stand to reason that the timer would run
> >> longer when there are TCP retransmits?
> >>
> >> For the request, it depends. If the retransmit is for part of the
> >> request body and Tomcat hasn't read that far yet (or starting reading
> >> at all) then it probably won't impact the processing time. If Tomcat
> >> is performing a read and waiting for that packet then it will.
> >>
> >> For the response, not unless the response is sfficiently big and the
> >> retransmit sufficiently earlier in the response that the TCP buffers
> >> fill and Tomcat is blocked from further writes.
> >>
> >> Mark
> >>
> >>
> >>>
> >>> --Eric
> >>>
> >>>> -Original Message- From: Mark Thomas 
> >>>> Sent: Tuesday, September 8, 2020 3:34 PM
> >>>> To: users@tomcat.apache.org Subject: Re: Tomcat Processing Timer
> >>>> Question
> >>>>
> >>>> On 08/09/2020 21:19, Eric Robinson wrote:
> >>>>> Hi Mark and Christopher,

RE: Tomcat Processing Timer Question

2020-09-08 Thread Eric Robinson
Mark --

Got it. So TCP retransmits can impact tomcat processing time under certain 
conditions, more likely due to issues with receiving requests from the client 
than sending responses.

-Eric

> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, September 8, 2020 4:05 PM
> To: users@tomcat.apache.org
> Subject: Re: Tomcat Processing Timer Question
>
> On 08/09/2020 21:46, Eric Robinson wrote:
> > Hi Mark --
> >
> > "If the request is split across multiple packets the timer starts when 
> > Tomcat
> reads the first byte of the request from the first packet.
> > Tomcat stops the timer on a request after the last byte of the response has
> been accepted by the network stack."
> >
> > Now we're getting somewhere. If tomcat starts its timer when it reads the
> first byte of the client's request, and the request is split into multiple 
> packets,
> doesn't it stand to reason that the timer would run longer when there are
> TCP retransmits?
>
> For the request, it depends. If the retransmit is for part of the request body
> and Tomcat hasn't read that far yet (or starting reading at
> all) then it probably won't impact the processing time. If Tomcat is
> performing a read and waiting for that packet then it will.
>
> For the response, not unless the response is sfficiently big and the 
> retransmit
> sufficiently earlier in the response that the TCP buffers fill and Tomcat is
> blocked from further writes.
>
> Mark
>
>
> >
> > --Eric
> >
> >> -Original Message-
> >> From: Mark Thomas 
> >> Sent: Tuesday, September 8, 2020 3:34 PM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Tomcat Processing Timer Question
> >>
> >> On 08/09/2020 21:19, Eric Robinson wrote:
> >>> Hi Mark and Christopher,
> >>>
> >>> For clarification, suppose a client sends and HTTP POST request
> >>> which
> >> is bigger than the PMTU and has to be broken into multiple packets.
> >> It sounds like you're saying that the request is buffered by the
> >> network stack, and the stack does not send it up to tomcat until the full
> request is received.
> >> That would make sense if every HTTP request is encapsulated in its
> >> own separate TCP connection. Most of the time, that is not the case.
> >> A single connection is held open and used for multiple HTTP requests.
> >> The network stack has no understanding of anything above TCP, so it
> >> does not know when an HTTP request complete. It must therefore
> >> deliver whatever it has, and it would be up to tomcat to decide when
> >> the HTTP request is complete, wouldn't it?
> >>>
> >>> If that is the case, tomcat could receive a partial HTTP request and
> >> would have to wait for the rest before processing it. So when does
> >> tomcat start its processing timer?
> >>
> >> Tomcat starts the processing timer as soon as Tomcat processes the
> >> first bytes of the request. In practice, this means the network stack
> >> has to deliver the data to Tomcat, the Poller fires a read event, a
> >> thread is allocated to process that read event, any TLS handshake has
> >> completed and Tomcat has read the first real byte of the request.
> >>
> >> If the request is split across multiple packets the timer starts when
> >> Tomcat reads the first byte of the request from the first packet.
> >>
> >> Tomcat stops the timer on a request after the last byte of the
> >> response has been accepted by the network stack.
> >>
> >> HTH,
> >>
> >> Mark
> >>
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Christopher Schultz 
> >>>> Sent: Tuesday, September 8, 2020 1:19 PM
> >>>> To: users@tomcat.apache.org
> >>>> Subject: Re: Tomcat Processing Timer Question
> >>>>
> >>> Eric,
> >>>
> >>> On 9/8/20 13:46, Eric Robinson wrote:
> >>>>>> It is my understanding that the AccessLogValve %D field records
> >>>>>> the time from when the last byte of the client's request is
> >>>>>> received to when the last byte of the server's response is placed on
> the wire.
> >>>>>> Is that correct? If so, would TCP retransmissions impact the timer?
> >>>
> >>> I'm not positive, but I believe Tomcat has zero visibility into that
> >>> level of detail.
> >>>
> >>>>&g

RE: Tomcat Processing Timer Question

2020-09-08 Thread Eric Robinson
Hi Mark --

"If the request is split across multiple packets the timer starts when Tomcat 
reads the first byte of the request from the first packet.
Tomcat stops the timer on a request after the last byte of the response has 
been accepted by the network stack."

Now we're getting somewhere. If tomcat starts its timer when it reads the first 
byte of the client's request, and the request is split into multiple packets, 
doesn't it stand to reason that the timer would run longer when there are TCP 
retransmits?

--Eric

> -Original Message-
> From: Mark Thomas 
> Sent: Tuesday, September 8, 2020 3:34 PM
> To: users@tomcat.apache.org
> Subject: Re: Tomcat Processing Timer Question
>
> On 08/09/2020 21:19, Eric Robinson wrote:
> > Hi Mark and Christopher,
> >
> > For clarification, suppose a client sends and HTTP POST request which
> is bigger than the PMTU and has to be broken into multiple packets. It
> sounds like you're saying that the request is buffered by the network stack,
> and the stack does not send it up to tomcat until the full request is 
> received.
> That would make sense if every HTTP request is encapsulated in its own
> separate TCP connection. Most of the time, that is not the case. A single
> connection is held open and used for multiple HTTP requests. The network
> stack has no understanding of anything above TCP, so it does not know when
> an HTTP request complete. It must therefore deliver whatever it has, and it
> would be up to tomcat to decide when the HTTP request is complete,
> wouldn't it?
> >
> > If that is the case, tomcat could receive a partial HTTP request and
> would have to wait for the rest before processing it. So when does tomcat
> start its processing timer?
>
> Tomcat starts the processing timer as soon as Tomcat processes the first
> bytes of the request. In practice, this means the network stack has to deliver
> the data to Tomcat, the Poller fires a read event, a thread is allocated to
> process that read event, any TLS handshake has completed and Tomcat has
> read the first real byte of the request.
>
> If the request is split across multiple packets the timer starts when Tomcat
> reads the first byte of the request from the first packet.
>
> Tomcat stops the timer on a request after the last byte of the response has
> been accepted by the network stack.
>
> HTH,
>
> Mark
>
> >
> >
> >> -Original Message-
> >> From: Christopher Schultz 
> >> Sent: Tuesday, September 8, 2020 1:19 PM
> >> To: users@tomcat.apache.org
> >> Subject: Re: Tomcat Processing Timer Question
> >>
> > Eric,
> >
> > On 9/8/20 13:46, Eric Robinson wrote:
> >>>> It is my understanding that the AccessLogValve %D field records the
> >>>> time from when the last byte of the client's request is received to
> >>>> when the last byte of the server's response is placed on the wire.
> >>>> Is that correct? If so, would TCP retransmissions impact the timer?
> >
> > I'm not positive, but I believe Tomcat has zero visibility into that
> > level of detail.
> >
> >>>> If there are connectivity issues between the client and server,
> >>>> resulting in TCP retransmits, could that appear as higher response
> >>>> times in the localhost_access logs?
> >
> > This would only happen if the re-transmissions were to cause network
> > buffering in the OS such that the stream writes (at the Java level)
> > were to block (and therefore "take time" instead of being essentially
> instantaneous).
> >
> > -chris
> >>
> >> -
> >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> >> For additional commands, e-mail: users-h...@tomcat.apache.org
> >
> > Disclaimer : This email and any files transmitted with it are
> confidential and intended solely for intended recipients. If you are not the
> named addressee you should not disseminate, distribute, copy or alter this
> email. Any views or opinions presented in this email are solely those of the
> author and might not represent those of Physician Select Management.
> Warning: Although Physician Select Management has taken reasonable
> precautions to ensure no viruses are present in this email, the company
> cannot accept responsibility for any loss or damage arising from the use of
> this email or attachments.
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-ma

RE: Tomcat Processing Timer Question

2020-09-08 Thread Eric Robinson
Hi Mark and Christopher,

For clarification, suppose a client sends and HTTP POST request which is bigger 
than the PMTU and has to be broken into multiple packets. It sounds like you're 
saying that the request is buffered by the network stack, and the stack does 
not send it up to tomcat until the full request is received. That would make 
sense if every HTTP request is encapsulated in its own separate TCP connection. 
Most of the time, that is not the case. A single connection is held open and 
used for multiple HTTP requests. The network stack has no understanding of 
anything above TCP, so it does not know when an HTTP request complete. It must 
therefore deliver whatever it has, and it would be up to tomcat to decide when 
the HTTP request is complete, wouldn't it?

If that is the case, tomcat could receive a partial HTTP request and would have 
to wait for the rest before processing it. So when does tomcat start its 
processing timer?


> -Original Message-
> From: Christopher Schultz 
> Sent: Tuesday, September 8, 2020 1:19 PM
> To: users@tomcat.apache.org
> Subject: Re: Tomcat Processing Timer Question
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Eric,
>
> On 9/8/20 13:46, Eric Robinson wrote:
> > It is my understanding that the AccessLogValve %D field records the
> > time from when the last byte of the client's request is received to
> > when the last byte of the server's response is placed on the wire. Is
> > that correct? If so, would TCP retransmissions impact the timer?
>
> I'm not positive, but I believe Tomcat has zero visibility into that level of
> detail.
>
> > If there are connectivity issues between the client and server,
> > resulting in TCP retransmits, could that appear as higher response
> > times in the localhost_access logs?
>
> This would only happen if the re-transmissions were to cause network
> buffering in the OS such that the stream writes (at the Java level) were to
> block (and therefore "take time" instead of being essentially instantaneous).
>
> - -chris
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl9XyvUACgkQHPApP
> 6U8
> pFi2PQ/9Er4fN490tP2Fq911WSX7+wQnnRwE9w5JWx36b++DtFAWxKJTpQGg
> 0Dl2
> pyP/1vVPB6ZHVPdVEUUvYPAzhDAVWJVrIdXUvcaMg2tKpb5zzERhdxpG2vEH
> Qykb
> YaRTPqu0QHNySjMyQ9yT3Q8YDSObvXAYnR+7f1aT1g9UOma63z4mKE11RuQl
> oXGz
> SqjiLzHjDQmehplDjTXTSwRxcjnJftCKG0Jwin4f8Kto6tJ/AJdTxaWmwXeSiRcn
> QN8b586DpyS/k0hgkJ0bOWhbxVsy4aUhM+PeyjN4AXufzSjymY4hv4hpOO+3
> 7woT
> SRj3rTd2LtS8h5v4VVSIFXTeL+kEwjo3iya/Komd4Z7Pu+qw91ZLy7LIrZfV4MHp
> 8me2jLobBiosIlXSAAxVjY3zOVlzqEOIjOL+t/Qwhn+CM/nDLfuhtwdfuu+KGpN
> s
> /u18gauI4eb4MtoSETcvb5OaFHdkrmInCD3BXz9ZZRrnVCL9r9SLPN1yxENerdV
> q
> RJrvJMItb5tLf0XcK7Wvm+lJIdArEkSCzZ3o3uDWbiRkN0hKU5R/Jndr/qfL12dj
> /knGWyED0UGaY58TgOMAN6veMO991/PTL6to5Xr6RTivEQO+6YYS1zj1uGPc5
> p/n
> gGnJ9b9VWeZBvlyDb7H9CxOvvkdJzMum6WaagIvmR5zi8ZZx3do=
> =QoNs
> -END PGP SIGNATURE-
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Tomcat Processing Timer Question

2020-09-08 Thread Eric Robinson
It is my understanding that the AccessLogValve %D field records the time from 
when the last byte of the client's request is received to when the last byte of 
the server's response is placed on the wire. Is that correct? If so, would TCP 
retransmissions impact the timer? If there are connectivity issues between the 
client and server, resulting in TCP retransmits, could that appear as higher 
response times in the localhost_access logs?

-Eric

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Does Tomcat/Java get around the problem of 64K maximum client source ports?

2020-03-29 Thread Eric Robinson
> -Original Message-
> From: André Warnier (tomcat/perl) 
> Sent: Saturday, March 28, 2020 5:35 PM
> To: users@tomcat.apache.org
> Subject: Re: Does Tomcat/Java get around the problem of 64K maximum
> client source ports?
>
> On 27.03.2020 21:39, Eric Robinson wrote:
> > FYI, I don't have 1800 tomcat instances on one server. I have about 100
> instances on each of 18 servers.
>
> When one of these (attempted) connections fails, do you not get some error
> message which gives a clue as to what the failure is due to ?
> (should be a log somewhere, no ?)
>

Hi André -- Yes, it does log a connection failure message. It's been a while 
since the last time it happened so I don't recall the exact wording of the 
error, but the gist of it is that it could create a TCP connection.

> Also, just for info :
> in the past, I have run into problems under Linux (no more connections
> accepted, neither incoming nor outgoing) whenever the actual number of
> TCP connections went above a certain number (maybe it was 64K).
> A TCP connection goes through a number of states (which you see with a
> netstat display), such as "ESTABLISHED" but also "TIME_WAIT",
> "CLOSE_WAIT" etc.. In some of these states, the connection no longer has
> any link to any process, but the connection still counts against the limit (of
> the OS/TCP stack).
>
> The case I'm talking about was a bit like yours : a webapp running under
> tomcat was making a connection to a remote host, but this connection was
> wrapped inside an object of some kind. When the webapp no longer needed
> the connection, it just discarded the wrapping object, which was left without
> references to it, and thus candidate for destruction at some point. But the
> discarded object never explicitly closed the underlying connection.
>
> Over a period of time, this left an accumulation of (no longer used)
> connections in the "CLOSE_WAIT" state (closed by the remote host side, but
> not by the webapp side), which just sat there until a GC happened, at which
> time the destruction of these objects really happened, and some implicit
> close was done at the OS level, which eliminated these pending underlying
> CLOSE_WAIT connections.
> And since the available heap was quite large, it took a long time before a GC
> happened, which allowed such CLOSE_WAIT connections to accumulate in
> the hundreds or thousands before being "recycled".
> Until a certain number was reached, and then the host became all but
> unreachable and very slow.
> That was a long time ago, and thus a lot of Java versions and Linux versions
> ago, so maybe something happened since then to avoid such a situation.
> But maybe also, you are suffering of some similar phenomenon.
> You could try to use netstat some more, and when you are having the
> problem, you should count at ALL the TCP connections, including the ones in
> CLOSE_WAIT, and just check if you do not have an obscene number of them
> in total.  There is definitely some limit number past which the OS starts 
> acting
> funny.
>
> (Note : unlike for TIME_WAIT e.g., there is no time limit for a connection in
> the CLOSE_WAIT state; it will stay in that state as long as the client side 
> has
> not explicitly closed it, in some kind of zombie half-life) See e.g. :
> https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Tr
> ansition_Diagram.pdf
>

I'm familiar with the issue you described above. In the past, we addressed it 
by decreasing the TIME_WAIT timer and/or by enabling TCP reuse. That helps some.

>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


RE: Does Tomcat/Java get around the problem of 64K maximum client source ports?

2020-03-27 Thread Eric Robinson
Thanks for all the feedback, André,  Christopher, and John. Let me see if I can 
quickly answer everyone's comments.

Since there is a TCB for each connection, and the OS knows which TCBs are 
associated with which processes, I don't see any problem using the same local 
port on different sockets. When a packet arrives from a remote server, the 
stack looks at the full socket details, checks for a matching TCB, and routes 
the packet to the appropriate process.  There's no confusion (except when using 
tools that don't show process names, like netstat without -p).

Using > 64K local source ports seems like a useful capability in high-load 
environments where tomcat is doing a lot of back-end access (i.e., where JSPs 
and class files frequently call back-end services). With hashing and indexing, 
having giant connection tables does not seem like an unrealistic amount of 
processing load to me. Linux-based stateful firewalls have to keep track of a 
lot more connections than that, with rule processing and even layer-7 
inspection at the same time, on relatively low-powered hardware.

FYI, I don't have 1800 tomcat instances on one server. I have about 100 
instances on each of 18 servers.

That said, I agree that the real focus should probably be on the JDBC driver. I 
asked the question here because it seemed like a good place to start. Any ideas 
where I could go to chat with JDBC developers?


--Eric

> -Original Message-
> From: Christopher Schultz 
> Sent: Friday, March 27, 2020 1:42 PM
> To: users@tomcat.apache.org
> Subject: Re: Does Tomcat/Java get around the problem of 64K maximum
> client source ports?
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> André,
>
> On 3/27/20 11:01, André Warnier (tomcat/perl) wrote:
> > On 27.03.2020 14:27, André Warnier (tomcat/perl) wrote:
> >> On 26.03.2020 20:42, Eric Robinson wrote:
> >>>> -Original Message- From: Olaf Kock 
> >>>> Sent: Thursday, March 26, 2020 2:06 PM
> >>>> To: users@tomcat.apache.org Subject: Re: Does Tomcat/Java get
> >>>> around the problem of 64K maximum client source ports?
> >>>>
> >>>> Hi Eric,
> >>>>
> >>>> On 26.03.20 18:58, Eric Robinson wrote:
> >>>>> Greetings,
> >>>>>
> >>>>> Many people say the maximum number of client ports is 64K.
> >>>>> However,
> >>>> TCP connections only require unique sockets, which are defined
> >>>> as...
> >>>>>
> >>>>> local_IP:local_port -> remote_ip:remote_port
> >>>>>
> >>>>> Theoretically, it is possible for a client process to keep using
> >>>>> the same local
> >>>> source port, as long as the connections are to a unique
> >>>> destinations. For example on a local machine, the following
> >>>> connections should be possible...
> >>>>>
> >>>>> 192.168.5.100:1400 -> 192.168.5.200:3306 192.168.5.100:1400
> >>>>> -> 192.168.5.201:3306 192.168.5.100:1400 ->
> >>>>> 192.168.5.202:3306 192.168.5.100:1400 ->
> >>>>> 192.168.5.203:3306
> >>>>>
> >>>>> I've seen this demonstrated successfully here:
> >>>>>
> >>>>> https://serverfault.com/questions/326819/does-the-tcp-source-port-
> have
> >>>>>
> >>>>>
> - -to-be-unique-per-host
> >>>>>
> >>>>> As someone on that page pointed out, while it is possible, it does
> >>>>> not
> >>>> commonly occur in practice "because most TCP APIs don't provide a
> >>>> way to create more than one connection with the same source port,
> >>>> unless they have different source IP addresses." This leads to the
> >>>> 64K maximum client port range, but it is really a limitation of the
> >>>> APIs, not TCP.
> >>>>>
> >>>>> So how does tomcat handle things? Is it limited to a maximum 64K
> >>>>> client
> >>>> source ports, or is it 64K per destination, as it should be?
> >>>>
> >>>> To be honest, I can't remember to have seen a web- or application
> >>>> server that accepts 64K concurrent connections at all, let alone to
> >>>> a single client.
> >>>>
> >>>> I can't come up with any reason to do so, I'd assume that there's a
> >>>> DOS attack if I get 100 concurrent incoming connections from a
> >>>> single IP, 

RE: Does Tomcat/Java get around the problem of 64K maximum client source ports?

2020-03-26 Thread Eric Robinson
> -Original Message-
> From: Olaf Kock 
> Sent: Thursday, March 26, 2020 2:06 PM
> To: users@tomcat.apache.org
> Subject: Re: Does Tomcat/Java get around the problem of 64K maximum
> client source ports?
>
> Hi Eric,
>
> On 26.03.20 18:58, Eric Robinson wrote:
> > Greetings,
> >
> > Many people say the maximum number of client ports is 64K. However,
> TCP connections only require unique sockets, which are defined as...
> >
> > local_IP:local_port -> remote_ip:remote_port
> >
> > Theoretically, it is possible for a client process to keep using the same 
> > local
> source port, as long as the connections are to a unique destinations. For
> example on a local machine, the following connections should be possible...
> >
> > 192.168.5.100:1400 -> 192.168.5.200:3306
> > 192.168.5.100:1400 -> 192.168.5.201:3306
> > 192.168.5.100:1400 -> 192.168.5.202:3306
> > 192.168.5.100:1400 -> 192.168.5.203:3306
> >
> > I've seen this demonstrated successfully here:
> >
> > https://serverfault.com/questions/326819/does-the-tcp-source-port-have
> > -to-be-unique-per-host
> >
> > As someone on that page pointed out, while it is possible, it does not
> commonly occur in practice "because most TCP APIs don't provide a way to
> create more than one connection with the same source port, unless they
> have different source IP addresses." This leads to the 64K maximum client
> port range, but it is really a limitation of the APIs, not TCP.
> >
> > So how does tomcat handle things? Is it limited to a maximum 64K client
> source ports, or is it 64K per destination, as it should be?
>
> To be honest, I can't remember to have seen a web- or application server
> that accepts 64K concurrent connections at all, let alone to a single client.
>
> I can't come up with any reason to do so, I'd assume that there's a DOS attack
> if I get 100 concurrent incoming connections from a single IP, and a
> programming error if the server sets up more than 1K outgoing connections
>
> Maybe I'm missing the obvious, or have only administered meaningless
> installations, but I fail to see the real world relevance of this question.
>
>

I don't blame you for being puzzled, but this not about tomcat accepting 
connections. It's about tomcat acting as the client, where MySQL is the server. 
I'm referring to client connections from tomcat to MySQL. We have about 1800 
instances of tomcat running. This question comes up once in a while when tomcat 
can't connect to MySQL. Trust me, it can be an issue.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: Tomcat Server Using 100% CPU

2019-08-09 Thread Eric Robinson
  LISTEN  
30166/java
tcp0  0 0.0.0.0:42922   0.0.0.0:*   LISTEN  
8597/java
tcp0  0 0.0.0.0:34340.0.0.0:*   LISTEN  
17313/java
tcp0  0 0.0.0.0:65060.0.0.0:*   LISTEN  
4447/java
tcp0  0 0.0.0.0:36778   0.0.0.0:*   LISTEN  
300/java
tcp0  0 0.0.0.0:64750.0.0.0:*   LISTEN  
7261/java
tcp0  0 0.0.0.0:35787   0.0.0.0:*   LISTEN  
19622/java
tcp0  0 0.0.0.0:46795   0.0.0.0:*   LISTEN  
3673/java
tcp0  0 0.0.0.0:32110.0.0.0:*   LISTEN  
30166/java
tcp0  0 0.0.0.0:34360.0.0.0:*   LISTEN  
31863/java
tcp0  0 0.0.0.0:31160.0.0.0:*   LISTEN  
29948/java
tcp0  0 0.0.0.0:65090.0.0.0:*   LISTEN  
9462/java
tcp0  0 0.0.0.0:64770.0.0.0:*   LISTEN  
7678/java
tcp0  0 0.0.0.0:39885   0.0.0.0:*   LISTEN  
4447/java
tcp0  0 0.0.0.0:33581   0.0.0.0:*   LISTEN  
300/java
tcp0  0 0.0.0.0:64780.0.0.0:*   LISTEN  
8027/java

-Original Message-
From: André Warnier (tomcat) 
Sent: Thursday, August 8, 2019 3:53 PM
To: users@tomcat.apache.org
Subject: Re: Tomcat Server Using 100% CPU

On 08.08.2019 20:08, Eric Robinson wrote:
> Utkarsh and John, thank you for your feedback.
>
> Since everything was originally on Windows, and we built a new Linux server 
> with fresh tomcat installs, and the only thing we moved over from the old 
> Windows servers was the tomcat application folder itself, and the 100% CPU 
> problem persisted, I can't imagine what else could be causing it except the 
> tomcats, but I'm open to ideas.
>
> When it happens, all the tomcats start using high CPU at the same time. See 
> the following top output.
>
> top - 11:06:44 up 1 day,  6:59,  7 users,  load average: 36.85, 32.67, 34.89
> Tasks: 245 total,   4 running, 241 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 80.7 us, 13.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.7
> si,  0.0 st KiB Mem : 48132572 total, 11677420 free,  5572688 used, 30882464 
> buff/cache
> KiB Swap: 15626236 total, 15584324 free,41912 used. 41859232 avail Mem
>
>PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 19379 site211   20   0 3529072 447444  24632 S 120.4  0.9   3:37.19 java
> 20092 site555   20   0 2530376 375500  24496 S  72.4  0.8   2:01.44 java
> 21077 site450   20   0 2530292 298260  24292 S  69.6  0.6   1:10.92 java
> 20378 site436   20   0 3262200 347160  24096 S  68.3  0.7   2:47.26 java
> 19957 site522   20   0 2596856 373532  24364 S  52.0  0.8   2:37.13 java
> 19537 site396   20   0 2862724 386860  23820 S  51.1  0.8   2:34.25 java
> 19228 site116   20   0 3595652 490032  24640 S  50.5  1.0   5:03.28 java
> 20941 site456   20   0 2596996 338740  24488 S  49.2  0.7   1:32.89 java
> 20789 site354   20   0 2596920 327612  24480 S  42.9  0.7   1:30.47 java
> 20657 site327   20   0 3123004 346308  24540 S  41.4  0.7   1:50.97 java
> 20524 site203   20   0 2458376 340400  25416 S  39.8  0.7   1:48.01 java
> 19675 site487   20   0 2530296 390948  24408 S  35.7  0.8   2:37.95 java
> 20233 site535   20   0 2530292 324112  24360 S  32.9  0.7   1:54.31 java
> 19809 site514   20   0 2530216 400308  24340 S  25.7  0.8   2:35.97 java
> 44 root  20   0   0  0  0 R  19.1  0.0 159:46.15 
> ksoftirqd/7
>   3926 root  20   0  208512  22668   4128 S  16.9  0.0 242:45.07 iotop
>   2036 root  20   0   0  0  0 R  13.2  0.0   1:38.31 
> kworker/7:0
>
> I'll check the localhost_access logs and see if something suspicious stands 
> out.
>

Access logs is the first thing to look at of course (just in case you are 
subject to some DoS attack), but other things of interest :
1) what is "the webapp" in question ? Any reason to suspect it may have been 
been hijacked, to do something it is not supposed to do ? does the webapp allow 
for clients to upload "things" to the server (files, documents, images,..) ?
2) if you look at the tomcat logs, do you recognise all the webapps which get 
deployed when tomcat starts ? Or is there an alien there ?
3) if you run "top" again, then enter a "c" in the console, it will show more 
details about the "java" command it is running.
Similarly, doing a "ps -ef" command and comparing the result (by PID) with the 
top output, may give more details.
That would show (us) at least the startup parameters of your tomcat(s).
4) speaking 

RE: Tomcat Server Using 100% CPU

2019-08-09 Thread Eric Robinson
Paul --

I've only used jconsole and thread dumps before. Never used jstack. I'll look 
into it.


On Thu, 08 Aug 2019, 20:22 Coty Sutherland,  wrote:

> I'd suggest writing a small script to loop about 10 times and capture
> top and thread dumps with jstack at the same time, then wait a few
> seconds then repeat. After that you can grab the pid/tid from the top
> output and compare that with your thread dump to see exactly what the
> thread is doing for the iteration/duration you specify.
>
> Other questions that I haven't seen asked, how long does the CPU usage
> persist? Is it only at startup or does it randomly start after some uptime?
> Have your webapps or dependencies changed around the time the issue
> started? Do the working and nonworking servers run the same webapps
> with the same workload?
>
> On Thu, Aug 8, 2019 at 2:09 PM Eric Robinson 
> wrote:
>
> > Utkarsh and John, thank you for your feedback.
> >
> > Since everything was originally on Windows, and we built a new Linux
> > server with fresh tomcat installs, and the only thing we moved over
> > from the old Windows servers was the tomcat application folder
> > itself, and the 100% CPU problem persisted, I can't imagine what
> > else could be causing it except the tomcats, but I'm open to ideas.
> >
> > When it happens, all the tomcats start using high CPU at the same time.
> > See the following top output.
> >
> > top - 11:06:44 up 1 day,  6:59,  7 users,  load average: 36.85,
> > 32.67,
> > 34.89
> > Tasks: 245 total,   4 running, 241 sleeping,   0 stopped,   0 zombie
> > %Cpu(s): 80.7 us, 13.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.7
> > si,
> > 0.0 st
> > KiB Mem : 48132572 total, 11677420 free,  5572688 used, 30882464
> buff/cache
> > KiB Swap: 15626236 total, 15584324 free,41912 used. 41859232 avail
> Mem
> >
> >   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
> > 19379 site211   20   0 3529072 447444  24632 S 120.4  0.9   3:37.19 java
> > 20092 site555   20   0 2530376 375500  24496 S  72.4  0.8   2:01.44 java
> > 21077 site450   20   0 2530292 298260  24292 S  69.6  0.6   1:10.92 java
> > 20378 site436   20   0 3262200 347160  24096 S  68.3  0.7   2:47.26 java
> > 19957 site522   20   0 2596856 373532  24364 S  52.0  0.8   2:37.13 java
> > 19537 site396   20   0 2862724 386860  23820 S  51.1  0.8   2:34.25 java
> > 19228 site116   20   0 3595652 490032  24640 S  50.5  1.0   5:03.28 java
> > 20941 site456   20   0 2596996 338740  24488 S  49.2  0.7   1:32.89 java
> > 20789 site354   20   0 2596920 327612  24480 S  42.9  0.7   1:30.47 java
> > 20657 site327   20   0 3123004 346308  24540 S  41.4  0.7   1:50.97 java
> > 20524 site203   20   0 2458376 340400  25416 S  39.8  0.7   1:48.01 java
> > 19675 site487   20   0 2530296 390948  24408 S  35.7  0.8   2:37.95 java
> > 20233 site535   20   0 2530292 324112  24360 S  32.9  0.7   1:54.31 java
> > 19809 site514   20   0 2530216 400308  24340 S  25.7  0.8   2:35.97 java
> >44 root  20   0   0  0  0 R  19.1  0.0 159:46.15
> > ksoftirqd/7
> >  3926 root  20   0  208512  22668   4128 S  16.9  0.0 242:45.07 iotop
> >  2036 root  20   0   0  0  0 R  13.2  0.0   1:38.31
> > kworker/7:0
> >
> > I'll check the localhost_access logs and see if something suspicious
> > stands out.
> >
> > --Eric
> >
> >
> > -Original Message-
> > From: Utkarsh Dave 
> > Sent: Thursday, August 8, 2019 12:33 PM
> > To: Tomcat Users List 
> > Subject: Re: Tomcat Server Using 100% CPU
> >
> > Did you reviewed the localhost_access log file. Which
> > web-application is using tomcat the most ?
> >
> > On Thu, Aug 8, 2019 at 9:53 AM Eric Robinson
> > 
> > wrote:
> >
> > > We have a farm of VMs, each running multiple instances of tomcat
> > > (up to 80 instances per server). Everything has been running fine
> > > for years, but recently one server has started nailing the CPU to
> > > 100%
> > utilization.
> > >
> > > We have tried:
> > >
> > >
> > >   *   Different versions of tomcat and JDK
> > >   *   Doubling the resources to 16 cores and 56 GB RAM
> > >   *   Moving the VM to different physical server
> > >   *   Rebuilding the tomcat instances on a brand new VM using Windows
> > > Server 2019
> > >   *   Rebuilding the tomcat instances on a brand new VM using Red Hat
> > > Enterprise Linux 7.5
> > >
> > > Nothing has worked. No matter where we run the tomcats, 

RE: Tomcat Server Using 100% CPU

2019-08-09 Thread Eric Robinson
Coty --

There is a normal period of high CPU right after startup, but that's expected. 
However, it usually peaks well after startup (minutes or hours) and then it 
stays that way, sometimes all night. When it happens, all the tomcats (about 20 
of them) running show high CPU, like 20-50% each.

I can write a script to loop and output top and thread dumps. Should I do it 
for all running instances or just for a selected target instance?

--Eric

-Original Message-
From: Coty Sutherland 
Sent: Thursday, August 8, 2019 1:22 PM
To: Tomcat Users List 
Subject: Re: Tomcat Server Using 100% CPU

I'd suggest writing a small script to loop about 10 times and capture top and 
thread dumps with jstack at the same time, then wait a few seconds then repeat. 
After that you can grab the pid/tid from the top output and compare that with 
your thread dump to see exactly what the thread is doing for the 
iteration/duration you specify.

Other questions that I haven't seen asked, how long does the CPU usage persist? 
Is it only at startup or does it randomly start after some uptime?
Have your webapps or dependencies changed around the time the issue started? Do 
the working and nonworking servers run the same webapps with the same workload?

On Thu, Aug 8, 2019 at 2:09 PM Eric Robinson 
wrote:

> Utkarsh and John, thank you for your feedback.
>
> Since everything was originally on Windows, and we built a new Linux
> server with fresh tomcat installs, and the only thing we moved over
> from the old Windows servers was the tomcat application folder itself,
> and the 100% CPU problem persisted, I can't imagine what else could be
> causing it except the tomcats, but I'm open to ideas.
>
> When it happens, all the tomcats start using high CPU at the same time.
> See the following top output.
>
> top - 11:06:44 up 1 day,  6:59,  7 users,  load average: 36.85, 32.67,
> 34.89
> Tasks: 245 total,   4 running, 241 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 80.7 us, 13.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.7
> si,
> 0.0 st
> KiB Mem : 48132572 total, 11677420 free,  5572688 used, 30882464 buff/cache
> KiB Swap: 15626236 total, 15584324 free,41912 used. 41859232 avail Mem
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 19379 site211   20   0 3529072 447444  24632 S 120.4  0.9   3:37.19 java
> 20092 site555   20   0 2530376 375500  24496 S  72.4  0.8   2:01.44 java
> 21077 site450   20   0 2530292 298260  24292 S  69.6  0.6   1:10.92 java
> 20378 site436   20   0 3262200 347160  24096 S  68.3  0.7   2:47.26 java
> 19957 site522   20   0 2596856 373532  24364 S  52.0  0.8   2:37.13 java
> 19537 site396   20   0 2862724 386860  23820 S  51.1  0.8   2:34.25 java
> 19228 site116   20   0 3595652 490032  24640 S  50.5  1.0   5:03.28 java
> 20941 site456   20   0 2596996 338740  24488 S  49.2  0.7   1:32.89 java
> 20789 site354   20   0 2596920 327612  24480 S  42.9  0.7   1:30.47 java
> 20657 site327   20   0 3123004 346308  24540 S  41.4  0.7   1:50.97 java
> 20524 site203   20   0 2458376 340400  25416 S  39.8  0.7   1:48.01 java
> 19675 site487   20   0 2530296 390948  24408 S  35.7  0.8   2:37.95 java
> 20233 site535   20   0 2530292 324112  24360 S  32.9  0.7   1:54.31 java
> 19809 site514   20   0 2530216 400308  24340 S  25.7  0.8   2:35.97 java
>44 root  20   0   0  0  0 R  19.1  0.0 159:46.15
> ksoftirqd/7
>  3926 root  20   0  208512  22668   4128 S  16.9  0.0 242:45.07 iotop
>  2036 root  20   0   0  0  0 R  13.2  0.0   1:38.31
> kworker/7:0
>
> I'll check the localhost_access logs and see if something suspicious
> stands out.
>
> --Eric
>
>
> -Original Message-
> From: Utkarsh Dave 
> Sent: Thursday, August 8, 2019 12:33 PM
> To: Tomcat Users List 
> Subject: Re: Tomcat Server Using 100% CPU
>
> Did you reviewed the localhost_access log file. Which web-application
> is using tomcat the most ?
>
> On Thu, Aug 8, 2019 at 9:53 AM Eric Robinson 
> wrote:
>
> > We have a farm of VMs, each running multiple instances of tomcat (up
> > to 80 instances per server). Everything has been running fine for
> > years, but recently one server has started nailing the CPU to 100%
> utilization.
> >
> > We have tried:
> >
> >
> >   *   Different versions of tomcat and JDK
> >   *   Doubling the resources to 16 cores and 56 GB RAM
> >   *   Moving the VM to different physical server
> >   *   Rebuilding the tomcat instances on a brand new VM using Windows
> > Server 2019
> >   *   Rebuilding the tomcat instances on a brand new VM using Red Hat
> > Enterprise Linux 7.5
> >
> > Nothing has worked. No matter where we run the tomcats, they drive
> > CPU up

RE: Tomcat Server Using 100% CPU

2019-08-08 Thread Eric Robinson
André, Paul, and Coty, you've all provided some great next steps. I'll 
investigate further and be back in touch!

--Eric

-Original Message-
From: André Warnier (tomcat) 
Sent: Thursday, August 8, 2019 3:53 PM
To: users@tomcat.apache.org
Subject: Re: Tomcat Server Using 100% CPU

On 08.08.2019 20:08, Eric Robinson wrote:
> Utkarsh and John, thank you for your feedback.
>
> Since everything was originally on Windows, and we built a new Linux server 
> with fresh tomcat installs, and the only thing we moved over from the old 
> Windows servers was the tomcat application folder itself, and the 100% CPU 
> problem persisted, I can't imagine what else could be causing it except the 
> tomcats, but I'm open to ideas.
>
> When it happens, all the tomcats start using high CPU at the same time. See 
> the following top output.
>
> top - 11:06:44 up 1 day,  6:59,  7 users,  load average: 36.85, 32.67, 34.89
> Tasks: 245 total,   4 running, 241 sleeping,   0 stopped,   0 zombie
> %Cpu(s): 80.7 us, 13.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.7
> si,  0.0 st KiB Mem : 48132572 total, 11677420 free,  5572688 used, 30882464 
> buff/cache
> KiB Swap: 15626236 total, 15584324 free,41912 used. 41859232 avail Mem
>
>PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 19379 site211   20   0 3529072 447444  24632 S 120.4  0.9   3:37.19 java
> 20092 site555   20   0 2530376 375500  24496 S  72.4  0.8   2:01.44 java
> 21077 site450   20   0 2530292 298260  24292 S  69.6  0.6   1:10.92 java
> 20378 site436   20   0 3262200 347160  24096 S  68.3  0.7   2:47.26 java
> 19957 site522   20   0 2596856 373532  24364 S  52.0  0.8   2:37.13 java
> 19537 site396   20   0 2862724 386860  23820 S  51.1  0.8   2:34.25 java
> 19228 site116   20   0 3595652 490032  24640 S  50.5  1.0   5:03.28 java
> 20941 site456   20   0 2596996 338740  24488 S  49.2  0.7   1:32.89 java
> 20789 site354   20   0 2596920 327612  24480 S  42.9  0.7   1:30.47 java
> 20657 site327   20   0 3123004 346308  24540 S  41.4  0.7   1:50.97 java
> 20524 site203   20   0 2458376 340400  25416 S  39.8  0.7   1:48.01 java
> 19675 site487   20   0 2530296 390948  24408 S  35.7  0.8   2:37.95 java
> 20233 site535   20   0 2530292 324112  24360 S  32.9  0.7   1:54.31 java
> 19809 site514   20   0 2530216 400308  24340 S  25.7  0.8   2:35.97 java
> 44 root  20   0   0  0  0 R  19.1  0.0 159:46.15 
> ksoftirqd/7
>   3926 root  20   0  208512  22668   4128 S  16.9  0.0 242:45.07 iotop
>   2036 root  20   0   0  0  0 R  13.2  0.0   1:38.31 
> kworker/7:0
>
> I'll check the localhost_access logs and see if something suspicious stands 
> out.
>

Access logs is the first thing to look at of course (just in case you are 
subject to some DoS attack), but other things of interest :
1) what is "the webapp" in question ? Any reason to suspect it may have been 
been hijacked, to do something it is not supposed to do ? does the webapp allow 
for clients to upload "things" to the server (files, documents, images,..) ?
2) if you look at the tomcat logs, do you recognise all the webapps which get 
deployed when tomcat starts ? Or is there an alien there ?
3) if you run "top" again, then enter a "c" in the console, it will show more 
details about the "java" command it is running.
Similarly, doing a "ps -ef" command and comparing the result (by PID) with the 
top output, may give more details.
That would show (us) at least the startup parameters of your tomcat(s).
4) speaking as a faithful Tomcat Committer, we always like to repeat that in 
99% of the cases it turns out that the problem is with the webapp, not with 
Tomcat. The fact that in your case, you have changed about everything except 
the webapp, and the problem persists, would only tend to increase that 
suspicion..
So, can the webapp do any logging that would show what it's doing, while it is 
happily slurping all your CPU time ?
6) does the tomcat error log show anything of interest ?
7) under Linux as root, enter : netstat --tcp -pan | grep LISTEN (Shows all TCP 
ports your server is listening to, and which PID/processes control these ports).
Anything unexpected there ? worse : anything unexpected which would match the 
PID of one of your tomcats ?

André


> --Eric
>
>
> -Original Message-
> From: Utkarsh Dave 
> Sent: Thursday, August 8, 2019 12:33 PM
> To: Tomcat Users List 
> Subject: Re: Tomcat Server Using 100% CPU
>
> Did you reviewed the localhost_access log file. Which web-application is 
> using tomcat the most ?
>
> On Thu, Aug 8, 2019 at 9:53 AM Eric Robinson 
> wrote:
>
>> We have a farm of VMs, each running multiple instances of tomcat (up
>> to 80 instances per server). Everything ha

RE: Tomcat Server Using 100% CPU

2019-08-08 Thread Eric Robinson
Utkarsh and John, thank you for your feedback.

Since everything was originally on Windows, and we built a new Linux server 
with fresh tomcat installs, and the only thing we moved over from the old 
Windows servers was the tomcat application folder itself, and the 100% CPU 
problem persisted, I can't imagine what else could be causing it except the 
tomcats, but I'm open to ideas.

When it happens, all the tomcats start using high CPU at the same time. See the 
following top output.

top - 11:06:44 up 1 day,  6:59,  7 users,  load average: 36.85, 32.67, 34.89
Tasks: 245 total,   4 running, 241 sleeping,   0 stopped,   0 zombie
%Cpu(s): 80.7 us, 13.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  5.7 si,  0.0 st
KiB Mem : 48132572 total, 11677420 free,  5572688 used, 30882464 buff/cache
KiB Swap: 15626236 total, 15584324 free,41912 used. 41859232 avail Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
19379 site211   20   0 3529072 447444  24632 S 120.4  0.9   3:37.19 java
20092 site555   20   0 2530376 375500  24496 S  72.4  0.8   2:01.44 java
21077 site450   20   0 2530292 298260  24292 S  69.6  0.6   1:10.92 java
20378 site436   20   0 3262200 347160  24096 S  68.3  0.7   2:47.26 java
19957 site522   20   0 2596856 373532  24364 S  52.0  0.8   2:37.13 java
19537 site396   20   0 2862724 386860  23820 S  51.1  0.8   2:34.25 java
19228 site116   20   0 3595652 490032  24640 S  50.5  1.0   5:03.28 java
20941 site456   20   0 2596996 338740  24488 S  49.2  0.7   1:32.89 java
20789 site354   20   0 2596920 327612  24480 S  42.9  0.7   1:30.47 java
20657 site327   20   0 3123004 346308  24540 S  41.4  0.7   1:50.97 java
20524 site203   20   0 2458376 340400  25416 S  39.8  0.7   1:48.01 java
19675 site487   20   0 2530296 390948  24408 S  35.7  0.8   2:37.95 java
20233 site535   20   0 2530292 324112  24360 S  32.9  0.7   1:54.31 java
19809 site514   20   0 2530216 400308  24340 S  25.7  0.8   2:35.97 java
   44 root  20   0   0  0  0 R  19.1  0.0 159:46.15 ksoftirqd/7
 3926 root  20   0  208512  22668   4128 S  16.9  0.0 242:45.07 iotop
 2036 root  20   0   0  0  0 R  13.2  0.0   1:38.31 kworker/7:0

I'll check the localhost_access logs and see if something suspicious stands out.

--Eric


-Original Message-
From: Utkarsh Dave 
Sent: Thursday, August 8, 2019 12:33 PM
To: Tomcat Users List 
Subject: Re: Tomcat Server Using 100% CPU

Did you reviewed the localhost_access log file. Which web-application is using 
tomcat the most ?

On Thu, Aug 8, 2019 at 9:53 AM Eric Robinson 
wrote:

> We have a farm of VMs, each running multiple instances of tomcat (up
> to 80 instances per server). Everything has been running fine for
> years, but recently one server has started nailing the CPU to 100% 
> utilization.
>
> We have tried:
>
>
>   *   Different versions of tomcat and JDK
>   *   Doubling the resources to 16 cores and 56 GB RAM
>   *   Moving the VM to different physical server
>   *   Rebuilding the tomcat instances on a brand new VM using Windows
> Server 2019
>   *   Rebuilding the tomcat instances on a brand new VM using Red Hat
> Enterprise Linux 7.5
>
> Nothing has worked. No matter where we run the tomcats, they drive CPU
> up to 100%. Meanwhile the other six servers are still running fine.
> They all run the same canned tomcat applications.
>
> We would appreciate some guidance on getting to the bottom of this problem.
>
> --Eric
>
>
> Disclaimer : This email and any files transmitted with it are
> confidential and intended solely for intended recipients. If you are
> not the named addressee you should not disseminate, distribute, copy or alter 
> this email.
> Any views or opinions presented in this email are solely those of the
> author and might not represent those of Physician Select Management.
> Warning: Although Physician Select Management has taken reasonable
> precautions to ensure no viruses are present in this email, the
> company cannot accept responsibility for any loss or damage arising
> from the use of this email or attachments.
>
Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Tomcat Server Using 100% CPU

2019-08-08 Thread Eric Robinson
We have a farm of VMs, each running multiple instances of tomcat (up to 80 
instances per server). Everything has been running fine for years, but recently 
one server has started nailing the CPU to 100% utilization.

We have tried:


  *   Different versions of tomcat and JDK
  *   Doubling the resources to 16 cores and 56 GB RAM
  *   Moving the VM to different physical server
  *   Rebuilding the tomcat instances on a brand new VM using Windows Server 
2019
  *   Rebuilding the tomcat instances on a brand new VM using Red Hat 
Enterprise Linux 7.5

Nothing has worked. No matter where we run the tomcats, they drive CPU up to 
100%. Meanwhile the other six servers are still running fine. They all run the 
same canned tomcat applications.

We would appreciate some guidance on getting to the bottom of this problem.

--Eric


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


SSO/AD Authentication

2018-07-03 Thread Eric Robinson
We have users in AD domain “Billing” who need to run a tomcat application from 
a server that is in domain “BackOffice.” 

Is it possible for a user in the Billing domain to login to the tomcat 
application in the BackOffice domain without having to re-authenticate?

--Eric




RE: Is it Normal for Tomcat 8 to Use 20-80% More Memory Than Tomcat 6?

2017-12-27 Thread Eric Robinson
> > More heap or more native memory?
> >
> 
> With the exact same Xms and Xmx settings, I get vastly different resident and
> virtual image sizes from the Linux ps command.
> 
> 
>  tomcatA: jdk1.8.0_152, res: 694312, virt: 5045084
>  tomcatB: jdk1.6.0_21, res: 332840, virt: 3922656
> 
> 
> And b is also tomcat8 right?

No, tomcatB is using tomcat6. 

> Can you make that also tomcat8 but keep java8?
> 
> 
> I mean A is java8 and tomcat8.. so make a C that is tomcat6 and java8
> 
> 

I don't think so. This is a requirement of the software company whose 
application solution we use. They are requiring us to move to tomcat 8 with jdk 
1.8. If we try to mix tomcat8 with jdk 1.6, supposedly we would have problems. 
I guess all this is being driven by the need to switch to TLS 1.2. I'm not sure 
if that would be a function of tomcat or java.

--Eric


RE: Is it Normal for Tomcat 8 to Use 20-80% More Memory Than Tomcat 6?

2017-12-22 Thread Eric Robinson
> Eric,
> 
> Just curious how much ram do you have in the server and cpu resources.
> 
> #free -m and # cat /proc/cpuinfo | egrep 'cores|processor'
> 
> (Not to insult your intelligence , I am just specifying what I was curious to 
> see)
> 
> And it's always easier to copy/paste than to think.
> 
> I see in another thread you went from Java 1.6_xxx to 1.8_xxx
> 
> That could be the whole story right there.
> 
> 

No offense taken. You're right, copy and paste is easier...

[root@app17 alley]# free -m
 total   used   free sharedbuffers cached
Mem: 64415  58110   6304  0   2938  18382
-/+ buffers/cache:  36789  27626
Swap:15999759  15240
[root@app17 alley]# cat /proc/cpuinfo | egrep 'cores|processor'
processor   : 0
cpu cores   : 6
processor   : 1
cpu cores   : 6
processor   : 2
cpu cores   : 6
processor   : 3
cpu cores   : 6
processor   : 4
cpu cores   : 6
processor   : 5
cpu cores   : 6
processor   : 6
cpu cores   : 6
processor   : 7
cpu cores   : 6
processor   : 8
cpu cores   : 6
processor   : 9
cpu cores   : 6
processor   : 10
cpu cores   : 6
processor   : 11
cpu cores   : 6

--Eric

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



  1   2   >