Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Jeroen van Ooststroom

On 10/02/16 12:00, Mark Thomas wrote:

On 10/02/2016 09:20, Mark Thomas wrote:

On 09/02/2016 16:19, Jeroen van Ooststroom wrote:


Hello Mark,

I have been trying to get an isolated test case reproducing the issue,
but so far without luck. I'll try to reproduce it again with the
original application and get a thread dump for you at the time it got
stuck. Is there any Tomcat-level logging you want to see as well
leading up to the stuck thread issue?

Thanks,
Jeroen...

Hello Mark,

See the thread dump attached.

Thanks. Is it edited? There are some "..." sections in the thread dump
that don't look normal.

A further piece of information that would be helpful is the version of
IceFaces you are using.

I've been looking at the source code for IcePush 4.1.0 and I have a
theory. Keep in mind I don't know the IcePush code at all.

1. Container Thread A ends up in AsyncAdaptingServlet.service()
This calls request.startAsync() followed by pushServer.service().
This moves the async state to STARTING which will only change to
STARTED once the container thread exists the
AsyncAdaptingServlet.service()

2. Assuming (as it appears from the stack traces) that pushServer is an
instance of BlockingConnectionServer, this calls activeServer.service

3. activeServer is an instance of RunningServer. The serviceMethod calls
pendingRequest.put(pushRequest). Call this push request A.

4. Meanwhile, non-container thread B is running the "Notification Queue
Consumer". It holds a lock on the BlockingConnectionServer instance.
It tries to process push request A and calls complete().
This blocks because the AsyncContext is still in state STARTING for
that request.

5. Container thread A continues and reaches sendNotifications(). This
blocks because that method is synchronized and non-container thread B
currently holds the lock.

At this point deadlock occurs.

The async request is in state STARTING.

That state can't change until container thread A exits the service() method.

Container Thread A is blocked in the service() method waiting for a lock
on BlockingConnectionServer.

Container Thread B holds the lock on BlockingConnectionServer and won't
release it until the the async request changes state to STARTED.


My guess is that the deadlock only occurs if the "Notification Queue
Consumer" happens to try processing pending requests in the short period
of time between container thread A calling
pendingRequest.put(pushRequest) and sendNotifications().

The Servlet spec is clear that any call to AsyncContext.complete() will
not complete until the associated container thread has completed
processing the original request. On that basis it is my current view
that this is an ICEpush bug.

You should be able to confirm this fairly easily is you use a debugge
HTH,

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

This seems to be a valid theory. If I can reproduce it using open source 
version of ICEfaces, I'll dig a bit further to see if I can prove this 
to be an ICEfaces bug.


Thanks again,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Jeroen van Ooststroom

Hello Mark,

Thank you for looking into this in more detail. I did edit the thread 
dump, but just to remove what I believed to be irrelevant pieces. I did 
not remove any lock information, just bits of stack traces of ICEfaces.


This was using ICEfaces-EE 3.3.0.GA_P03. I'll have a look to see if I 
can reproduce it using an open source version of ICEfaces.


Thanks,
Jeroen...

On 10/02/16 10:20, Mark Thomas wrote:

On 09/02/2016 16:19, Jeroen van Ooststroom wrote:


Hello Mark,

I have been trying to get an isolated test case reproducing the issue,
but so far without luck. I'll try to reproduce it again with the
original application and get a thread dump for you at the time it got
stuck. Is there any Tomcat-level logging you want to see as well
leading up to the stuck thread issue?

Thanks,
Jeroen...

Hello Mark,

See the thread dump attached.

Thanks. Is it edited? There are some "..." sections in the thread dump
that don't look normal.

A further piece of information that would be helpful is the version of
IceFaces you are using.

Thanks,

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-09 Thread Jeroen van Ooststroom

On 05/02/16 13:46, Jeroen van Ooststroom wrote:

On 05/02/16 12:05, Mark Thomas wrote:

On 05/02/2016 09:00, Mark Thomas wrote:

On 05/02/2016 00:30, Jeroen van Ooststroom wrote:

Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 
5.11
we are getting a stuck thread issue when our server is on high(er) 
load.

It seems to happen when one of our non-Container Threads invokes the
AsyncContext.complete() method while the AsyncStateMachine class’ 
state

class member is set to STARTING instead of STARTED, resulting in the
invocation of the Object’s wait() method in the
pauseNonContainerThread() method. It never seems to recover from this.

I tried a couple of things within our source code trying to get around
this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
AsyncContext.complete(), but if it returns false

That should not be possible. And I think I have figured out what the
problem is. Let me explain.

The expected sequence of processing is:

0) Servlet.service() -> entry
1) Servlet.service() -> call startAsync()
2) Servlet.service() -> start a non-container thread to do some
processing
3) Servlet.service() -> exits
4) non-container thread calls complete()

Each connection has a dedicated async state machine that tracks the
current state of the connection.

At point 0 the state machine is in the state DISPATCHED (a.k.a. not 
async)

At point 1 the state machine transitions to STARTING.
At point 3 the state machine transitions to STARTED.
At point 4 the state machine transitions to COMPLETING, a container
thread is assigned which fires the onComplete() event for any listeners
and then changes state to DISPATCHED.

The point of the waits is to ensure that step 4) always occurs after
step 3). If a non-container thread is paused, the aynsc post-processing
code for container threads will unpause the thread once it completes.

If a non-container thread calls complete() at an inappropriate time 
then

an IllegalStateExcpetion should be thrown.

The problem is that the non-container thread is paused before the check
for the illegal state is made. Therefore, in the case where the
non-container thread is making a call at an inappropriate time, it is
paused and will never be unpaused. We need to check the state is valid
before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

I can provide you with the logs. What Tomcat-level log settings do you 
want me to use? I can also provide you with the thread dump upon the 
thread being stuck. I can also try to get an isolated test case that 
is able to reproduce the issue, but it will take some time.


Regards,
Jeroen...

Hello Mark,

I have been trying to get an isolated test case reproducing the issue, 
but so far without luck. I'll try to reproduce it again with the 
original application and get a thread dump for you at the time it got 
stuck. Is there any Tomcat-level logging you want to see as well leading 
up to the stuck thread issue?


Thanks,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-05 Thread Jeroen van Ooststroom

On 05/02/16 12:05, Mark Thomas wrote:

On 05/02/2016 09:00, Mark Thomas wrote:

On 05/02/2016 00:30, Jeroen van Ooststroom wrote:

Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11
we are getting a stuck thread issue when our server is on high(er) load.
It seems to happen when one of our non-Container Threads invokes the
AsyncContext.complete() method while the AsyncStateMachine class’ state
class member is set to STARTING instead of STARTED, resulting in the
invocation of the Object’s wait() method in the
pauseNonContainerThread() method. It never seems to recover from this.

I tried a couple of things within our source code trying to get around
this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
AsyncContext.complete(), but if it returns false

That should not be possible. And I think I have figured out what the
problem is. Let me explain.

The expected sequence of processing is:

0) Servlet.service() -> entry
1) Servlet.service() -> call startAsync()
2) Servlet.service() -> start a non-container thread to do some
processing
3) Servlet.service() -> exits
4) non-container thread calls complete()

Each connection has a dedicated async state machine that tracks the
current state of the connection.

At point 0 the state machine is in the state DISPATCHED (a.k.a. not async)
At point 1 the state machine transitions to STARTING.
At point 3 the state machine transitions to STARTED.
At point 4 the state machine transitions to COMPLETING, a container
thread is assigned which fires the onComplete() event for any listeners
and then changes state to DISPATCHED.

The point of the waits is to ensure that step 4) always occurs after
step 3). If a non-container thread is paused, the aynsc post-processing
code for container threads will unpause the thread once it completes.

If a non-container thread calls complete() at an inappropriate time then
an IllegalStateExcpetion should be thrown.

The problem is that the non-container thread is paused before the check
for the illegal state is made. Therefore, in the case where the
non-container thread is making a call at an inappropriate time, it is
paused and will never be unpaused. We need to check the state is valid
before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

I can provide you with the logs. What Tomcat-level log settings do you 
want me to use? I can also provide you with the thread dump upon the 
thread being stuck. I can also try to get an isolated test case that is 
able to reproduce the issue, but it will take some time.


Regards,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Stuck thread when invoking the AsyncContext.complete() method

2016-02-04 Thread Jeroen van Ooststroom

Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11 
we are getting a stuck thread issue when our server is on high(er) load. 
It seems to happen when one of our non-Container Threads invokes the 
AsyncContext.complete() method while the AsyncStateMachine class’ state 
class member is set to STARTING instead of STARTED, resulting in the 
invocation of the Object’s wait() method in the 
pauseNonContainerThread() method. It never seems to recover from this.


I tried a couple of things within our source code trying to get around this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
   AsyncContext.complete(), but if it returns false sleep for an
   arbitrary time and try again. Even if
   HttpServletRequest.isAsyncStarted() returned true and I invoke
   AsyncContext.complete() on occasion we still get the described
   problem. I'm not sure if the HttpServletRequest.isAsyncStarted()
   correlates with the AsyncStateMachine class’ state class member.
2. I tried a similar approach using the AsyncListener class, but again
   without consistent success.
3. I tried enforcing an arbitrary delay even before invoking the
   AsyncContext.complete() while HttpServletRequest.isAsyncStarted()
   returned true, but again without consistent success.

I also tried a change within Tomcat's source code trying to get around this:

1. Instead of invoking the Object's wait() method I tried to invoke the
   Object's wait(long timeout) method and checking if the state class
   member ever seems to change from STARTING to STARTED when in this
   situation. This didn't help either and the state class member seems
   to be stuck in STARTING.

private synchronized void pauseNonContainerThread() {
while (!ContainerThreadMarker.isContainerThread() &&
state.getPauseNonContainerThread()) {
try {
System.out.println("*** pauseNonContainerThread ***
   (before wait :: state: '" + state + "')");
   *wait(500);*
System.out.println("*** pauseNonContainerThread ***
   (after wait)");
} catch (InterruptedException e) {
// TODO Log this?
}
}
}

I can replicate the issue, but it takes quite a number of load tries 
before I get it replicated. It also seems to be harder to replicate this 
on Tomcat 8.0.30 than it is on Tomcat 8.0.23. I don’t have an isolated 
test case (yet) that I can upload somewhere. However, I can provide logs 
and thread dumps if needed.


In addition, looking at the Tomcat source code, I couldn’t figure out 
what/who is supposed to invoke the Object's notify() method on the 
non-Container Thread once Tomcat decided to invoke the Object’s wait() 
method on it. As Tomcat decided to invoke the Object’s wait() method on 
it I kind of expect Tomcat to be responsible for invoking the Object's 
notify() method on it eventually when the right conditions are met or 
something. I’m not saying this is a/the problem though, I'm just trying 
to understand what would/should happen when you get into this state.


Regards,
Jeroen...