Re: JNDIRealm thread blocking issue

2021-10-29 Thread Suvendu Sekhar Mondal
On Fri, Oct 29, 2021 at 2:02 PM Rémy Maucherat  wrote:
>
> On Fri, Oct 29, 2021 at 9:28 AM Suvendu Sekhar Mondal  
> wrote:
> >
> > Hello Chris,
> >
> > On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz
> >  wrote:
> > >
> > > Suvendu,
> > >
> > > On 10/28/21 12:55, Suvendu Sekhar Mondal wrote:
> > > > Hello Everyone,
> > > >
> > > > I was investigating one thread pool exhaustion issue. Thread dump
> > > > analysis showed that all HTTP threads were waiting for a ReentrantLock
> > > > object. Object address 0x00066d727f28 were same for all of the
> > > > waiting threads:
> > > >
> > > > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
> > > > tid=0x51835800 nid=0x29bc waiting on condition
> > > > [0x7a5be000]
> > > > java.lang.Thread.State: WAITING (parking)
> > > > at sun.misc.Unsafe.park(Native Method)
> > > > - parking to wait for  <0x00066d727f28> (a
> > > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > > > at 
> > > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > > > at 
> > > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > > > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > > > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
> > > > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)
> > > >
> > > > There was no hint in the thread dump about which thread was owning the
> > > > lock. Luckily, one heap dump was taken before generating thread dump.
> > > > When I queried the heap dump for that ReentrantLock object, I saw that
> > > > another thread(http-nio-18100-exec-4) was holding the
> > > > lock(exclusiveOwnerThread). There was NO trace of
> > > > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
> > > > "lock without an owner" case.
> > >
> > > I think you are looking at several pieces of evidence that may or may
> > > not correlate to each other at all. The fact that the thread wasn't in
> > > the thread dump indicates that the thread (or even the whole JVM) had
> > > terminated between the time you took the heap-dump and the thread dump.
> > > Most likely, the monitor was owned by another thread when you took your
> > > thread-dump. Try using other tools which *do* reveal the lock-holders
> > > identity.
> > >
> >
> > This issue has happened a few times. "Busy Thread Count" was high
> > during the problem period. JVM was up and running when I collected
> > heap and thread dumps - pid was not changed in-between. I used jstack,
> > visualvm, jcmd - nothing revealed owing thread details. Only heap
> > dumps had some information on that object and which thread was holding
> > onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg
> >
> > I can simulate similar blocking without Tomcat with dummy code. There
> > also nothing reveals the owner's identity except the heap dump. Here
> > is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2
> >
> > > > After glancing through the Tomcat’s JNDIRealm.get() code and
> > > > beyond[1], I can see lock is being acquired on singleConnectionLock.
> > > > That lock is getting released either in the close() or release()
> > > > method. So, if something bad happens to the thread which is trying to
> > > > establish a connection, then lock will be held without a proper owner
> > > > and a thread blocking situation will be created. Am I interpreting the
> > > > code correctly? Should we not handle any failure inside get()?
> > > >
> > > > Also, I still have not got the reason why the thread got terminated.
> > > > Any suggestions on how I can enable any specific logging?
> > > >
> > > > My setup is:
> > > > Tomcat version: 9.0.39
> > > > Connector: NIO
> > > > JDK: AdoptOpenJDK: 1.8.192
> > > > OS: Windows 2016
> > >
> > > Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x
> > > changelog for "JNDIRealm" and you'll see there have been changes since
> > > 9.0.39 that may have already resolved this issue. Are you able to
> > > re-test with Tomcat 9.0.54?
> > >
> >
> > It will not be easy for me to upgrade it and test it. Lots of approval
> > is required to get that done. :(
> >
> > >  > [1]
> > > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553
> > >
> > > Note that you are looking at the current version of JNDIRealm.java. The
> > > version you are running is 17 commits behind that.
> > >
> > > The line of code calling ReentrantLock.lock in your code would be
> > > https://github.com/apac

Re: JNDIRealm thread blocking issue

2021-10-29 Thread Rémy Maucherat
On Fri, Oct 29, 2021 at 9:28 AM Suvendu Sekhar Mondal  wrote:
>
> Hello Chris,
>
> On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz
>  wrote:
> >
> > Suvendu,
> >
> > On 10/28/21 12:55, Suvendu Sekhar Mondal wrote:
> > > Hello Everyone,
> > >
> > > I was investigating one thread pool exhaustion issue. Thread dump
> > > analysis showed that all HTTP threads were waiting for a ReentrantLock
> > > object. Object address 0x00066d727f28 were same for all of the
> > > waiting threads:
> > >
> > > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
> > > tid=0x51835800 nid=0x29bc waiting on condition
> > > [0x7a5be000]
> > > java.lang.Thread.State: WAITING (parking)
> > > at sun.misc.Unsafe.park(Native Method)
> > > - parking to wait for  <0x00066d727f28> (a
> > > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > > at 
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > > at 
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > > at 
> > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > > at 
> > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
> > > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)
> > >
> > > There was no hint in the thread dump about which thread was owning the
> > > lock. Luckily, one heap dump was taken before generating thread dump.
> > > When I queried the heap dump for that ReentrantLock object, I saw that
> > > another thread(http-nio-18100-exec-4) was holding the
> > > lock(exclusiveOwnerThread). There was NO trace of
> > > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
> > > "lock without an owner" case.
> >
> > I think you are looking at several pieces of evidence that may or may
> > not correlate to each other at all. The fact that the thread wasn't in
> > the thread dump indicates that the thread (or even the whole JVM) had
> > terminated between the time you took the heap-dump and the thread dump.
> > Most likely, the monitor was owned by another thread when you took your
> > thread-dump. Try using other tools which *do* reveal the lock-holders
> > identity.
> >
>
> This issue has happened a few times. "Busy Thread Count" was high
> during the problem period. JVM was up and running when I collected
> heap and thread dumps - pid was not changed in-between. I used jstack,
> visualvm, jcmd - nothing revealed owing thread details. Only heap
> dumps had some information on that object and which thread was holding
> onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg
>
> I can simulate similar blocking without Tomcat with dummy code. There
> also nothing reveals the owner's identity except the heap dump. Here
> is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2
>
> > > After glancing through the Tomcat’s JNDIRealm.get() code and
> > > beyond[1], I can see lock is being acquired on singleConnectionLock.
> > > That lock is getting released either in the close() or release()
> > > method. So, if something bad happens to the thread which is trying to
> > > establish a connection, then lock will be held without a proper owner
> > > and a thread blocking situation will be created. Am I interpreting the
> > > code correctly? Should we not handle any failure inside get()?
> > >
> > > Also, I still have not got the reason why the thread got terminated.
> > > Any suggestions on how I can enable any specific logging?
> > >
> > > My setup is:
> > > Tomcat version: 9.0.39
> > > Connector: NIO
> > > JDK: AdoptOpenJDK: 1.8.192
> > > OS: Windows 2016
> >
> > Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x
> > changelog for "JNDIRealm" and you'll see there have been changes since
> > 9.0.39 that may have already resolved this issue. Are you able to
> > re-test with Tomcat 9.0.54?
> >
>
> It will not be easy for me to upgrade it and test it. Lots of approval
> is required to get that done. :(
>
> >  > [1]
> > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553
> >
> > Note that you are looking at the current version of JNDIRealm.java. The
> > version you are running is 17 commits behind that.
> >
> > The line of code calling ReentrantLock.lock in your code would be
> > https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385
> > which is "return null" indicating that there is a version mismatch
> > between the code you are running and the code you are reading.
> >
>
> Yeah

Re: JNDIRealm thread blocking issue

2021-10-29 Thread Suvendu Sekhar Mondal
Hello Chris,

On Fri, Oct 29, 2021 at 2:46 AM Christopher Schultz
 wrote:
>
> Suvendu,
>
> On 10/28/21 12:55, Suvendu Sekhar Mondal wrote:
> > Hello Everyone,
> >
> > I was investigating one thread pool exhaustion issue. Thread dump
> > analysis showed that all HTTP threads were waiting for a ReentrantLock
> > object. Object address 0x00066d727f28 were same for all of the
> > waiting threads:
> >
> > "http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
> > tid=0x51835800 nid=0x29bc waiting on condition
> > [0x7a5be000]
> > java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > - parking to wait for  <0x00066d727f28> (a
> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at 
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> > at 
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> > at 
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> > at 
> > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> > at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
> > at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)
> >
> > There was no hint in the thread dump about which thread was owning the
> > lock. Luckily, one heap dump was taken before generating thread dump.
> > When I queried the heap dump for that ReentrantLock object, I saw that
> > another thread(http-nio-18100-exec-4) was holding the
> > lock(exclusiveOwnerThread). There was NO trace of
> > http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
> > "lock without an owner" case.
>
> I think you are looking at several pieces of evidence that may or may
> not correlate to each other at all. The fact that the thread wasn't in
> the thread dump indicates that the thread (or even the whole JVM) had
> terminated between the time you took the heap-dump and the thread dump.
> Most likely, the monitor was owned by another thread when you took your
> thread-dump. Try using other tools which *do* reveal the lock-holders
> identity.
>

This issue has happened a few times. "Busy Thread Count" was high
during the problem period. JVM was up and running when I collected
heap and thread dumps - pid was not changed in-between. I used jstack,
visualvm, jcmd - nothing revealed owing thread details. Only heap
dumps had some information on that object and which thread was holding
onto it. Here is a snap: https://pasteboard.co/D7dV3jej6zId.jpg

I can simulate similar blocking without Tomcat with dummy code. There
also nothing reveals the owner's identity except the heap dump. Here
is sample: https://gist.github.com/suv3ndu/2ec9fe660d2b833996817ed62186eac2

> > After glancing through the Tomcat’s JNDIRealm.get() code and
> > beyond[1], I can see lock is being acquired on singleConnectionLock.
> > That lock is getting released either in the close() or release()
> > method. So, if something bad happens to the thread which is trying to
> > establish a connection, then lock will be held without a proper owner
> > and a thread blocking situation will be created. Am I interpreting the
> > code correctly? Should we not handle any failure inside get()?
> >
> > Also, I still have not got the reason why the thread got terminated.
> > Any suggestions on how I can enable any specific logging?
> >
> > My setup is:
> > Tomcat version: 9.0.39
> > Connector: NIO
> > JDK: AdoptOpenJDK: 1.8.192
> > OS: Windows 2016
>
> Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x
> changelog for "JNDIRealm" and you'll see there have been changes since
> 9.0.39 that may have already resolved this issue. Are you able to
> re-test with Tomcat 9.0.54?
>

It will not be easy for me to upgrade it and test it. Lots of approval
is required to get that done. :(

>  > [1]
> https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553
>
> Note that you are looking at the current version of JNDIRealm.java. The
> version you are running is 17 commits behind that.
>
> The line of code calling ReentrantLock.lock in your code would be
> https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385
> which is "return null" indicating that there is a version mismatch
> between the code you are running and the code you are reading.
>

Yeah, that's correct. Sorry for the confusion. Our version is running:
https://github.com/apache/tomcat/blob/95658dfd868216db0773c38aad8eebf544024b09/java/org/apache/catalina/realm/JNDIRealm.java#L2385
That get() has not changed since then. That's why I asked about

Re: JNDIRealm thread blocking issue

2021-10-28 Thread Christopher Schultz

Suvendu,

On 10/28/21 12:55, Suvendu Sekhar Mondal wrote:

Hello Everyone,

I was investigating one thread pool exhaustion issue. Thread dump
analysis showed that all HTTP threads were waiting for a ReentrantLock
object. Object address 0x00066d727f28 were same for all of the
waiting threads:

"http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
tid=0x51835800 nid=0x29bc waiting on condition
[0x7a5be000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00066d727f28> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)

There was no hint in the thread dump about which thread was owning the
lock. Luckily, one heap dump was taken before generating thread dump.
When I queried the heap dump for that ReentrantLock object, I saw that
another thread(http-nio-18100-exec-4) was holding the
lock(exclusiveOwnerThread). There was NO trace of
http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
"lock without an owner" case.


I think you are looking at several pieces of evidence that may or may 
not correlate to each other at all. The fact that the thread wasn't in 
the thread dump indicates that the thread (or even the whole JVM) had 
terminated between the time you took the heap-dump and the thread dump. 
Most likely, the monitor was owned by another thread when you took your 
thread-dump. Try using other tools which *do* reveal the lock-holders 
identity.



After glancing through the Tomcat’s JNDIRealm.get() code and
beyond[1], I can see lock is being acquired on singleConnectionLock.
That lock is getting released either in the close() or release()
method. So, if something bad happens to the thread which is trying to
establish a connection, then lock will be held without a proper owner
and a thread blocking situation will be created. Am I interpreting the
code correctly? Should we not handle any failure inside get()?

Also, I still have not got the reason why the thread got terminated.
Any suggestions on how I can enable any specific logging?

My setup is:
Tomcat version: 9.0.39
Connector: NIO
JDK: AdoptOpenJDK: 1.8.192
OS: Windows 2016


Looks like you need a whole bunch of upgrades. Search the Tomcat 9.x 
changelog for "JNDIRealm" and you'll see there have been changes since 
9.0.39 that may have already resolved this issue. Are you able to 
re-test with Tomcat 9.0.54?


> [1] 
https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553


Note that you are looking at the current version of JNDIRealm.java. The 
version you are running is 17 commits behind that.


The line of code calling ReentrantLock.lock in your code would be 
https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2385 
which is "return null" indicating that there is a version mismatch 
between the code you are running and the code you are reading.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



JNDIRealm thread blocking issue

2021-10-28 Thread Suvendu Sekhar Mondal
Hello Everyone,

I was investigating one thread pool exhaustion issue. Thread dump
analysis showed that all HTTP threads were waiting for a ReentrantLock
object. Object address 0x00066d727f28 were same for all of the
waiting threads:

"http-nio-18100-exec-86" #32808 daemon prio=5 os_prio=0
tid=0x51835800 nid=0x29bc waiting on condition
[0x7a5be000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00066d727f28> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.apache.catalina.realm.JNDIRealm.get(JNDIRealm.java:2385)
at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1274)

There was no hint in the thread dump about which thread was owning the
lock. Luckily, one heap dump was taken before generating thread dump.
When I queried the heap dump for that ReentrantLock object, I saw that
another thread(http-nio-18100-exec-4) was holding the
lock(exclusiveOwnerThread). There was NO trace of
http-nio-18100-exec-4 thread in any of the thread dumps! So it was a
"lock without an owner" case.

After glancing through the Tomcat’s JNDIRealm.get() code and
beyond[1], I can see lock is being acquired on singleConnectionLock.
That lock is getting released either in the close() or release()
method. So, if something bad happens to the thread which is trying to
establish a connection, then lock will be held without a proper owner
and a thread blocking situation will be created. Am I interpreting the
code correctly? Should we not handle any failure inside get()?

Also, I still have not got the reason why the thread got terminated.
Any suggestions on how I can enable any specific logging?

My setup is:
Tomcat version: 9.0.39
Connector: NIO
JDK: AdoptOpenJDK: 1.8.192
OS: Windows 2016

[1] 
https://github.com/apache/tomcat/blob/57a6a40fc9f995e4d449358bbde047aab6d9f39a/java/org/apache/catalina/realm/JNDIRealm.java#L2553

Thanks & Regards,
Suvendu

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org