Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Mark Thomas
On 10/02/2016 09:20, Mark Thomas wrote:
> On 09/02/2016 16:19, Jeroen van Ooststroom wrote:
> 
>>> Hello Mark,
>>>
>>> I have been trying to get an isolated test case reproducing the issue,
>>> but so far without luck. I'll try to reproduce it again with the
>>> original application and get a thread dump for you at the time it got
>>> stuck. Is there any Tomcat-level logging you want to see as well
>>> leading up to the stuck thread issue?
>>>
>>> Thanks,
>>> Jeroen...
>> Hello Mark,
>>
>> See the thread dump attached.
> 
> Thanks. Is it edited? There are some "..." sections in the thread dump
> that don't look normal.
> 
> A further piece of information that would be helpful is the version of
> IceFaces you are using.

I've been looking at the source code for IcePush 4.1.0 and I have a
theory. Keep in mind I don't know the IcePush code at all.

1. Container Thread A ends up in AsyncAdaptingServlet.service()
   This calls request.startAsync() followed by pushServer.service().
   This moves the async state to STARTING which will only change to
   STARTED once the container thread exists the
   AsyncAdaptingServlet.service()

2. Assuming (as it appears from the stack traces) that pushServer is an
   instance of BlockingConnectionServer, this calls activeServer.service

3. activeServer is an instance of RunningServer. The serviceMethod calls
   pendingRequest.put(pushRequest). Call this push request A.

4. Meanwhile, non-container thread B is running the "Notification Queue
   Consumer". It holds a lock on the BlockingConnectionServer instance.
   It tries to process push request A and calls complete().
   This blocks because the AsyncContext is still in state STARTING for
   that request.

5. Container thread A continues and reaches sendNotifications(). This
   blocks because that method is synchronized and non-container thread B
   currently holds the lock.

At this point deadlock occurs.

The async request is in state STARTING.

That state can't change until container thread A exits the service() method.

Container Thread A is blocked in the service() method waiting for a lock
on BlockingConnectionServer.

Container Thread B holds the lock on BlockingConnectionServer and won't
release it until the the async request changes state to STARTED.


My guess is that the deadlock only occurs if the "Notification Queue
Consumer" happens to try processing pending requests in the short period
of time between container thread A calling
pendingRequest.put(pushRequest) and sendNotifications().

The Servlet spec is clear that any call to AsyncContext.complete() will
not complete until the associated container thread has completed
processing the original request. On that basis it is my current view
that this is an ICEpush bug.

You should be able to confirm this fairly easily is you use a debugge
HTH,

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Jeroen van Ooststroom

On 10/02/16 12:00, Mark Thomas wrote:

On 10/02/2016 09:20, Mark Thomas wrote:

On 09/02/2016 16:19, Jeroen van Ooststroom wrote:


Hello Mark,

I have been trying to get an isolated test case reproducing the issue,
but so far without luck. I'll try to reproduce it again with the
original application and get a thread dump for you at the time it got
stuck. Is there any Tomcat-level logging you want to see as well
leading up to the stuck thread issue?

Thanks,
Jeroen...

Hello Mark,

See the thread dump attached.

Thanks. Is it edited? There are some "..." sections in the thread dump
that don't look normal.

A further piece of information that would be helpful is the version of
IceFaces you are using.

I've been looking at the source code for IcePush 4.1.0 and I have a
theory. Keep in mind I don't know the IcePush code at all.

1. Container Thread A ends up in AsyncAdaptingServlet.service()
This calls request.startAsync() followed by pushServer.service().
This moves the async state to STARTING which will only change to
STARTED once the container thread exists the
AsyncAdaptingServlet.service()

2. Assuming (as it appears from the stack traces) that pushServer is an
instance of BlockingConnectionServer, this calls activeServer.service

3. activeServer is an instance of RunningServer. The serviceMethod calls
pendingRequest.put(pushRequest). Call this push request A.

4. Meanwhile, non-container thread B is running the "Notification Queue
Consumer". It holds a lock on the BlockingConnectionServer instance.
It tries to process push request A and calls complete().
This blocks because the AsyncContext is still in state STARTING for
that request.

5. Container thread A continues and reaches sendNotifications(). This
blocks because that method is synchronized and non-container thread B
currently holds the lock.

At this point deadlock occurs.

The async request is in state STARTING.

That state can't change until container thread A exits the service() method.

Container Thread A is blocked in the service() method waiting for a lock
on BlockingConnectionServer.

Container Thread B holds the lock on BlockingConnectionServer and won't
release it until the the async request changes state to STARTED.


My guess is that the deadlock only occurs if the "Notification Queue
Consumer" happens to try processing pending requests in the short period
of time between container thread A calling
pendingRequest.put(pushRequest) and sendNotifications().

The Servlet spec is clear that any call to AsyncContext.complete() will
not complete until the associated container thread has completed
processing the original request. On that basis it is my current view
that this is an ICEpush bug.

You should be able to confirm this fairly easily is you use a debugge
HTH,

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

This seems to be a valid theory. If I can reproduce it using open source 
version of ICEfaces, I'll dig a bit further to see if I can prove this 
to be an ICEfaces bug.


Thanks again,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Jeroen van Ooststroom

Hello Mark,

Thank you for looking into this in more detail. I did edit the thread 
dump, but just to remove what I believed to be irrelevant pieces. I did 
not remove any lock information, just bits of stack traces of ICEfaces.


This was using ICEfaces-EE 3.3.0.GA_P03. I'll have a look to see if I 
can reproduce it using an open source version of ICEfaces.


Thanks,
Jeroen...

On 10/02/16 10:20, Mark Thomas wrote:

On 09/02/2016 16:19, Jeroen van Ooststroom wrote:


Hello Mark,

I have been trying to get an isolated test case reproducing the issue,
but so far without luck. I'll try to reproduce it again with the
original application and get a thread dump for you at the time it got
stuck. Is there any Tomcat-level logging you want to see as well
leading up to the stuck thread issue?

Thanks,
Jeroen...

Hello Mark,

See the thread dump attached.

Thanks. Is it edited? There are some "..." sections in the thread dump
that don't look normal.

A further piece of information that would be helpful is the version of
IceFaces you are using.

Thanks,

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Mark Thomas
On 10/02/2016 11:00, Mark Thomas wrote:
> On 10/02/2016 09:20, Mark Thomas wrote:
>> On 09/02/2016 16:19, Jeroen van Ooststroom wrote:
>>
 Hello Mark,

 I have been trying to get an isolated test case reproducing the issue,
 but so far without luck. I'll try to reproduce it again with the
 original application and get a thread dump for you at the time it got
 stuck. Is there any Tomcat-level logging you want to see as well
 leading up to the stuck thread issue?

 Thanks,
 Jeroen...
>>> Hello Mark,
>>>
>>> See the thread dump attached.
>>
>> Thanks. Is it edited? There are some "..." sections in the thread dump
>> that don't look normal.
>>
>> A further piece of information that would be helpful is the version of
>> IceFaces you are using.
> 
> I've been looking at the source code for IcePush 4.1.0 and I have a
> theory. Keep in mind I don't know the IcePush code at all.
> 
> 1. Container Thread A ends up in AsyncAdaptingServlet.service()
>This calls request.startAsync() followed by pushServer.service().
>This moves the async state to STARTING which will only change to
>STARTED once the container thread exists the
>AsyncAdaptingServlet.service()
> 
> 2. Assuming (as it appears from the stack traces) that pushServer is an
>instance of BlockingConnectionServer, this calls activeServer.service
> 
> 3. activeServer is an instance of RunningServer. The serviceMethod calls
>pendingRequest.put(pushRequest). Call this push request A.
> 
> 4. Meanwhile, non-container thread B is running the "Notification Queue
>Consumer". It holds a lock on the BlockingConnectionServer instance.
>It tries to process push request A and calls complete().
>This blocks because the AsyncContext is still in state STARTING for
>that request.
> 
> 5. Container thread A continues and reaches sendNotifications(). This
>blocks because that method is synchronized and non-container thread B
>currently holds the lock.
> 
> At this point deadlock occurs.
> 
> The async request is in state STARTING.
> 
> That state can't change until container thread A exits the service() method.
> 
> Container Thread A is blocked in the service() method waiting for a lock
> on BlockingConnectionServer.
> 
> Container Thread B holds the lock on BlockingConnectionServer and won't
> release it until the the async request changes state to STARTED.
> 
> 
> My guess is that the deadlock only occurs if the "Notification Queue
> Consumer" happens to try processing pending requests in the short period
> of time between container thread A calling
> pendingRequest.put(pushRequest) and sendNotifications().
> 
> The Servlet spec is clear that any call to AsyncContext.complete() will
> not complete until the associated container thread has completed
> processing the original request. On that basis it is my current view
> that this is an ICEpush bug.
> 
> You should be able to confirm this fairly easily is you use a debugge

Bah. Hit send too soon.

You should be able to confirm this fairly easily if you use a debugger
to pause a container thread between the calls to
pendingRequest.put(pushRequest) and sendNotifications(). That should
then trigger the problem.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-10 Thread Mark Thomas
On 09/02/2016 16:19, Jeroen van Ooststroom wrote:

>> Hello Mark,
>>
>> I have been trying to get an isolated test case reproducing the issue,
>> but so far without luck. I'll try to reproduce it again with the
>> original application and get a thread dump for you at the time it got
>> stuck. Is there any Tomcat-level logging you want to see as well
>> leading up to the stuck thread issue?
>>
>> Thanks,
>> Jeroen...
> Hello Mark,
> 
> See the thread dump attached.

Thanks. Is it edited? There are some "..." sections in the thread dump
that don't look normal.

A further piece of information that would be helpful is the version of
IceFaces you are using.

Thanks,

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-09 Thread Jeroen van Ooststroom

On 05/02/16 13:46, Jeroen van Ooststroom wrote:

On 05/02/16 12:05, Mark Thomas wrote:

On 05/02/2016 09:00, Mark Thomas wrote:

On 05/02/2016 00:30, Jeroen van Ooststroom wrote:

Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 
5.11
we are getting a stuck thread issue when our server is on high(er) 
load.

It seems to happen when one of our non-Container Threads invokes the
AsyncContext.complete() method while the AsyncStateMachine class’ 
state

class member is set to STARTING instead of STARTED, resulting in the
invocation of the Object’s wait() method in the
pauseNonContainerThread() method. It never seems to recover from this.

I tried a couple of things within our source code trying to get around
this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
AsyncContext.complete(), but if it returns false

That should not be possible. And I think I have figured out what the
problem is. Let me explain.

The expected sequence of processing is:

0) Servlet.service() -> entry
1) Servlet.service() -> call startAsync()
2) Servlet.service() -> start a non-container thread to do some
processing
3) Servlet.service() -> exits
4) non-container thread calls complete()

Each connection has a dedicated async state machine that tracks the
current state of the connection.

At point 0 the state machine is in the state DISPATCHED (a.k.a. not 
async)

At point 1 the state machine transitions to STARTING.
At point 3 the state machine transitions to STARTED.
At point 4 the state machine transitions to COMPLETING, a container
thread is assigned which fires the onComplete() event for any listeners
and then changes state to DISPATCHED.

The point of the waits is to ensure that step 4) always occurs after
step 3). If a non-container thread is paused, the aynsc post-processing
code for container threads will unpause the thread once it completes.

If a non-container thread calls complete() at an inappropriate time 
then

an IllegalStateExcpetion should be thrown.

The problem is that the non-container thread is paused before the check
for the illegal state is made. Therefore, in the case where the
non-container thread is making a call at an inappropriate time, it is
paused and will never be unpaused. We need to check the state is valid
before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

I can provide you with the logs. What Tomcat-level log settings do you 
want me to use? I can also provide you with the thread dump upon the 
thread being stuck. I can also try to get an isolated test case that 
is able to reproduce the issue, but it will take some time.


Regards,
Jeroen...

Hello Mark,

I have been trying to get an isolated test case reproducing the issue, 
but so far without luck. I'll try to reproduce it again with the 
original application and get a thread dump for you at the time it got 
stuck. Is there any Tomcat-level logging you want to see as well leading 
up to the stuck thread issue?


Thanks,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-05 Thread Mark Thomas
On 05/02/2016 00:30, Jeroen van Ooststroom wrote:
> Hello,
> 
> Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11
> we are getting a stuck thread issue when our server is on high(er) load.
> It seems to happen when one of our non-Container Threads invokes the
> AsyncContext.complete() method while the AsyncStateMachine class’ state
> class member is set to STARTING instead of STARTED, resulting in the
> invocation of the Object’s wait() method in the
> pauseNonContainerThread() method. It never seems to recover from this.
> 
> I tried a couple of things within our source code trying to get around
> this:
> 
> 1. If HttpServletRequest.isAsyncStarted() returns true invoke
>AsyncContext.complete(), but if it returns false

That should not be possible. And I think I have figured out what the
problem is. Let me explain.

The expected sequence of processing is:

0) Servlet.service() -> entry
1) Servlet.service() -> call startAsync()
2) Servlet.service() -> start a non-container thread to do some
   processing
3) Servlet.service() -> exits
4) non-container thread calls complete()

Each connection has a dedicated async state machine that tracks the
current state of the connection.

At point 0 the state machine is in the state DISPATCHED (a.k.a. not async)
At point 1 the state machine transitions to STARTING.
At point 3 the state machine transitions to STARTED.
At point 4 the state machine transitions to COMPLETING, a container
thread is assigned which fires the onComplete() event for any listeners
and then changes state to DISPATCHED.

The point of the waits is to ensure that step 4) always occurs after
step 3). If a non-container thread is paused, the aynsc post-processing
code for container threads will unpause the thread once it completes.

If a non-container thread calls complete() at an inappropriate time then
an IllegalStateExcpetion should be thrown.

The problem is that the non-container thread is paused before the check
for the illegal state is made. Therefore, in the case where the
non-container thread is making a call at an inappropriate time, it is
paused and will never be unpaused. We need to check the state is valid
before pausing the thread.

I should be able to get this fixed today. We've just missed the recent
9.0.x and 8.0.x tags so I'd expect the fix to be available early next month.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-05 Thread Mark Thomas
On 05/02/2016 09:00, Mark Thomas wrote:
> On 05/02/2016 00:30, Jeroen van Ooststroom wrote:
>> Hello,
>>
>> Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11
>> we are getting a stuck thread issue when our server is on high(er) load.
>> It seems to happen when one of our non-Container Threads invokes the
>> AsyncContext.complete() method while the AsyncStateMachine class’ state
>> class member is set to STARTING instead of STARTED, resulting in the
>> invocation of the Object’s wait() method in the
>> pauseNonContainerThread() method. It never seems to recover from this.
>>
>> I tried a couple of things within our source code trying to get around
>> this:
>>
>> 1. If HttpServletRequest.isAsyncStarted() returns true invoke
>>AsyncContext.complete(), but if it returns false
> 
> That should not be possible. And I think I have figured out what the
> problem is. Let me explain.
> 
> The expected sequence of processing is:
> 
> 0) Servlet.service() -> entry
> 1) Servlet.service() -> call startAsync()
> 2) Servlet.service() -> start a non-container thread to do some
>processing
> 3) Servlet.service() -> exits
> 4) non-container thread calls complete()
> 
> Each connection has a dedicated async state machine that tracks the
> current state of the connection.
> 
> At point 0 the state machine is in the state DISPATCHED (a.k.a. not async)
> At point 1 the state machine transitions to STARTING.
> At point 3 the state machine transitions to STARTED.
> At point 4 the state machine transitions to COMPLETING, a container
> thread is assigned which fires the onComplete() event for any listeners
> and then changes state to DISPATCHED.
> 
> The point of the waits is to ensure that step 4) always occurs after
> step 3). If a non-container thread is paused, the aynsc post-processing
> code for container threads will unpause the thread once it completes.
> 
> If a non-container thread calls complete() at an inappropriate time then
> an IllegalStateExcpetion should be thrown.
> 
> The problem is that the non-container thread is paused before the check
> for the illegal state is made. Therefore, in the case where the
> non-container thread is making a call at an inappropriate time, it is
> paused and will never be unpaused. We need to check the state is valid
> before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Stuck thread when invoking the AsyncContext.complete() method

2016-02-05 Thread Jeroen van Ooststroom

On 05/02/16 12:05, Mark Thomas wrote:

On 05/02/2016 09:00, Mark Thomas wrote:

On 05/02/2016 00:30, Jeroen van Ooststroom wrote:

Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11
we are getting a stuck thread issue when our server is on high(er) load.
It seems to happen when one of our non-Container Threads invokes the
AsyncContext.complete() method while the AsyncStateMachine class’ state
class member is set to STARTING instead of STARTED, resulting in the
invocation of the Object’s wait() method in the
pauseNonContainerThread() method. It never seems to recover from this.

I tried a couple of things within our source code trying to get around
this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
AsyncContext.complete(), but if it returns false

That should not be possible. And I think I have figured out what the
problem is. Let me explain.

The expected sequence of processing is:

0) Servlet.service() -> entry
1) Servlet.service() -> call startAsync()
2) Servlet.service() -> start a non-container thread to do some
processing
3) Servlet.service() -> exits
4) non-container thread calls complete()

Each connection has a dedicated async state machine that tracks the
current state of the connection.

At point 0 the state machine is in the state DISPATCHED (a.k.a. not async)
At point 1 the state machine transitions to STARTING.
At point 3 the state machine transitions to STARTED.
At point 4 the state machine transitions to COMPLETING, a container
thread is assigned which fires the onComplete() event for any listeners
and then changes state to DISPATCHED.

The point of the waits is to ensure that step 4) always occurs after
step 3). If a non-container thread is paused, the aynsc post-processing
code for container threads will unpause the thread once it completes.

If a non-container thread calls complete() at an inappropriate time then
an IllegalStateExcpetion should be thrown.

The problem is that the non-container thread is paused before the check
for the illegal state is made. Therefore, in the case where the
non-container thread is making a call at an inappropriate time, it is
paused and will never be unpaused. We need to check the state is valid
before pausing the thread.

Drat. This is already checked and I missed it when reviewing the code.

That narrows the possible causes down to:
- The container thread never exits the Servlet.service() method. A
thread dump will confirm or eliminate that possibility.
- The container thread experiences an error which causes it not to
unpause the non-container thread. I've been over the code several times
and short of an OOME, ThreadDeath or similar error I don't see how this
could happen.

It would be worth checking the logs before the point where this problem
occurs for any errors.

I'm going to review the code again to see if I can see any other
potential problems.

One option is to refactor the pause method and check periodically if the
thread should still be paused. It might be worth adding some debug
logging to get a better idea of what is going on when this issue occurs.

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

I can provide you with the logs. What Tomcat-level log settings do you 
want me to use? I can also provide you with the thread dump upon the 
thread being stuck. I can also try to get an isolated test case that is 
able to reproduce the issue, but it will take some time.


Regards,
Jeroen...

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org