Re: Stuck thread when invoking the AsyncContext.complete() method
On 10/02/16 12:00, Mark Thomas wrote: On 10/02/2016 09:20, Mark Thomas wrote: On 09/02/2016 16:19, Jeroen van Ooststroom wrote: Hello Mark, I have been trying to get an isolated test case reproducing the issue, but so far without luck. I'll try to reproduce it again with the original application and get a thread dump for you at the time it got stuck. Is there any Tomcat-level logging you want to see as well leading up to the stuck thread issue? Thanks, Jeroen... Hello Mark, See the thread dump attached. Thanks. Is it edited? There are some "..." sections in the thread dump that don't look normal. A further piece of information that would be helpful is the version of IceFaces you are using. I've been looking at the source code for IcePush 4.1.0 and I have a theory. Keep in mind I don't know the IcePush code at all. 1. Container Thread A ends up in AsyncAdaptingServlet.service() This calls request.startAsync() followed by pushServer.service(). This moves the async state to STARTING which will only change to STARTED once the container thread exists the AsyncAdaptingServlet.service() 2. Assuming (as it appears from the stack traces) that pushServer is an instance of BlockingConnectionServer, this calls activeServer.service 3. activeServer is an instance of RunningServer. The serviceMethod calls pendingRequest.put(pushRequest). Call this push request A. 4. Meanwhile, non-container thread B is running the "Notification Queue Consumer". It holds a lock on the BlockingConnectionServer instance. It tries to process push request A and calls complete(). This blocks because the AsyncContext is still in state STARTING for that request. 5. Container thread A continues and reaches sendNotifications(). This blocks because that method is synchronized and non-container thread B currently holds the lock. At this point deadlock occurs. The async request is in state STARTING. That state can't change until container thread A exits the service() method. Container Thread A is blocked in the service() method waiting for a lock on BlockingConnectionServer. Container Thread B holds the lock on BlockingConnectionServer and won't release it until the the async request changes state to STARTED. My guess is that the deadlock only occurs if the "Notification Queue Consumer" happens to try processing pending requests in the short period of time between container thread A calling pendingRequest.put(pushRequest) and sendNotifications(). The Servlet spec is clear that any call to AsyncContext.complete() will not complete until the associated container thread has completed processing the original request. On that basis it is my current view that this is an ICEpush bug. You should be able to confirm this fairly easily is you use a debugge HTH, Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org This seems to be a valid theory. If I can reproduce it using open source version of ICEfaces, I'll dig a bit further to see if I can prove this to be an ICEfaces bug. Thanks again, Jeroen... - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Stuck thread when invoking the AsyncContext.complete() method
Hello Mark, Thank you for looking into this in more detail. I did edit the thread dump, but just to remove what I believed to be irrelevant pieces. I did not remove any lock information, just bits of stack traces of ICEfaces. This was using ICEfaces-EE 3.3.0.GA_P03. I'll have a look to see if I can reproduce it using an open source version of ICEfaces. Thanks, Jeroen... On 10/02/16 10:20, Mark Thomas wrote: On 09/02/2016 16:19, Jeroen van Ooststroom wrote: Hello Mark, I have been trying to get an isolated test case reproducing the issue, but so far without luck. I'll try to reproduce it again with the original application and get a thread dump for you at the time it got stuck. Is there any Tomcat-level logging you want to see as well leading up to the stuck thread issue? Thanks, Jeroen... Hello Mark, See the thread dump attached. Thanks. Is it edited? There are some "..." sections in the thread dump that don't look normal. A further piece of information that would be helpful is the version of IceFaces you are using. Thanks, Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Stuck thread when invoking the AsyncContext.complete() method
On 05/02/16 13:46, Jeroen van Ooststroom wrote: On 05/02/16 12:05, Mark Thomas wrote: On 05/02/2016 09:00, Mark Thomas wrote: On 05/02/2016 00:30, Jeroen van Ooststroom wrote: Hello, Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11 we are getting a stuck thread issue when our server is on high(er) load. It seems to happen when one of our non-Container Threads invokes the AsyncContext.complete() method while the AsyncStateMachine class’ state class member is set to STARTING instead of STARTED, resulting in the invocation of the Object’s wait() method in the pauseNonContainerThread() method. It never seems to recover from this. I tried a couple of things within our source code trying to get around this: 1. If HttpServletRequest.isAsyncStarted() returns true invoke AsyncContext.complete(), but if it returns false That should not be possible. And I think I have figured out what the problem is. Let me explain. The expected sequence of processing is: 0) Servlet.service() -> entry 1) Servlet.service() -> call startAsync() 2) Servlet.service() -> start a non-container thread to do some processing 3) Servlet.service() -> exits 4) non-container thread calls complete() Each connection has a dedicated async state machine that tracks the current state of the connection. At point 0 the state machine is in the state DISPATCHED (a.k.a. not async) At point 1 the state machine transitions to STARTING. At point 3 the state machine transitions to STARTED. At point 4 the state machine transitions to COMPLETING, a container thread is assigned which fires the onComplete() event for any listeners and then changes state to DISPATCHED. The point of the waits is to ensure that step 4) always occurs after step 3). If a non-container thread is paused, the aynsc post-processing code for container threads will unpause the thread once it completes. If a non-container thread calls complete() at an inappropriate time then an IllegalStateExcpetion should be thrown. The problem is that the non-container thread is paused before the check for the illegal state is made. Therefore, in the case where the non-container thread is making a call at an inappropriate time, it is paused and will never be unpaused. We need to check the state is valid before pausing the thread. Drat. This is already checked and I missed it when reviewing the code. That narrows the possible causes down to: - The container thread never exits the Servlet.service() method. A thread dump will confirm or eliminate that possibility. - The container thread experiences an error which causes it not to unpause the non-container thread. I've been over the code several times and short of an OOME, ThreadDeath or similar error I don't see how this could happen. It would be worth checking the logs before the point where this problem occurs for any errors. I'm going to review the code again to see if I can see any other potential problems. One option is to refactor the pause method and check periodically if the thread should still be paused. It might be worth adding some debug logging to get a better idea of what is going on when this issue occurs. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org I can provide you with the logs. What Tomcat-level log settings do you want me to use? I can also provide you with the thread dump upon the thread being stuck. I can also try to get an isolated test case that is able to reproduce the issue, but it will take some time. Regards, Jeroen... Hello Mark, I have been trying to get an isolated test case reproducing the issue, but so far without luck. I'll try to reproduce it again with the original application and get a thread dump for you at the time it got stuck. Is there any Tomcat-level logging you want to see as well leading up to the stuck thread issue? Thanks, Jeroen... - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Stuck thread when invoking the AsyncContext.complete() method
On 05/02/16 12:05, Mark Thomas wrote: On 05/02/2016 09:00, Mark Thomas wrote: On 05/02/2016 00:30, Jeroen van Ooststroom wrote: Hello, Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11 we are getting a stuck thread issue when our server is on high(er) load. It seems to happen when one of our non-Container Threads invokes the AsyncContext.complete() method while the AsyncStateMachine class’ state class member is set to STARTING instead of STARTED, resulting in the invocation of the Object’s wait() method in the pauseNonContainerThread() method. It never seems to recover from this. I tried a couple of things within our source code trying to get around this: 1. If HttpServletRequest.isAsyncStarted() returns true invoke AsyncContext.complete(), but if it returns false That should not be possible. And I think I have figured out what the problem is. Let me explain. The expected sequence of processing is: 0) Servlet.service() -> entry 1) Servlet.service() -> call startAsync() 2) Servlet.service() -> start a non-container thread to do some processing 3) Servlet.service() -> exits 4) non-container thread calls complete() Each connection has a dedicated async state machine that tracks the current state of the connection. At point 0 the state machine is in the state DISPATCHED (a.k.a. not async) At point 1 the state machine transitions to STARTING. At point 3 the state machine transitions to STARTED. At point 4 the state machine transitions to COMPLETING, a container thread is assigned which fires the onComplete() event for any listeners and then changes state to DISPATCHED. The point of the waits is to ensure that step 4) always occurs after step 3). If a non-container thread is paused, the aynsc post-processing code for container threads will unpause the thread once it completes. If a non-container thread calls complete() at an inappropriate time then an IllegalStateExcpetion should be thrown. The problem is that the non-container thread is paused before the check for the illegal state is made. Therefore, in the case where the non-container thread is making a call at an inappropriate time, it is paused and will never be unpaused. We need to check the state is valid before pausing the thread. Drat. This is already checked and I missed it when reviewing the code. That narrows the possible causes down to: - The container thread never exits the Servlet.service() method. A thread dump will confirm or eliminate that possibility. - The container thread experiences an error which causes it not to unpause the non-container thread. I've been over the code several times and short of an OOME, ThreadDeath or similar error I don't see how this could happen. It would be worth checking the logs before the point where this problem occurs for any errors. I'm going to review the code again to see if I can see any other potential problems. One option is to refactor the pause method and check periodically if the thread should still be paused. It might be worth adding some debug logging to get a better idea of what is going on when this issue occurs. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org I can provide you with the logs. What Tomcat-level log settings do you want me to use? I can also provide you with the thread dump upon the thread being stuck. I can also try to get an isolated test case that is able to reproduce the issue, but it will take some time. Regards, Jeroen... - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Stuck thread when invoking the AsyncContext.complete() method
Hello, Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11 we are getting a stuck thread issue when our server is on high(er) load. It seems to happen when one of our non-Container Threads invokes the AsyncContext.complete() method while the AsyncStateMachine class’ state class member is set to STARTING instead of STARTED, resulting in the invocation of the Object’s wait() method in the pauseNonContainerThread() method. It never seems to recover from this. I tried a couple of things within our source code trying to get around this: 1. If HttpServletRequest.isAsyncStarted() returns true invoke AsyncContext.complete(), but if it returns false sleep for an arbitrary time and try again. Even if HttpServletRequest.isAsyncStarted() returned true and I invoke AsyncContext.complete() on occasion we still get the described problem. I'm not sure if the HttpServletRequest.isAsyncStarted() correlates with the AsyncStateMachine class’ state class member. 2. I tried a similar approach using the AsyncListener class, but again without consistent success. 3. I tried enforcing an arbitrary delay even before invoking the AsyncContext.complete() while HttpServletRequest.isAsyncStarted() returned true, but again without consistent success. I also tried a change within Tomcat's source code trying to get around this: 1. Instead of invoking the Object's wait() method I tried to invoke the Object's wait(long timeout) method and checking if the state class member ever seems to change from STARTING to STARTED when in this situation. This didn't help either and the state class member seems to be stuck in STARTING. private synchronized void pauseNonContainerThread() { while (!ContainerThreadMarker.isContainerThread() && state.getPauseNonContainerThread()) { try { System.out.println("*** pauseNonContainerThread *** (before wait :: state: '" + state + "')"); *wait(500);* System.out.println("*** pauseNonContainerThread *** (after wait)"); } catch (InterruptedException e) { // TODO Log this? } } } I can replicate the issue, but it takes quite a number of load tries before I get it replicated. It also seems to be harder to replicate this on Tomcat 8.0.30 than it is on Tomcat 8.0.23. I don’t have an isolated test case (yet) that I can upload somewhere. However, I can provide logs and thread dumps if needed. In addition, looking at the Tomcat source code, I couldn’t figure out what/who is supposed to invoke the Object's notify() method on the non-Container Thread once Tomcat decided to invoke the Object’s wait() method on it. As Tomcat decided to invoke the Object’s wait() method on it I kind of expect Tomcat to be responsible for invoking the Object's notify() method on it eventually when the right conditions are met or something. I’m not saying this is a/the problem though, I'm just trying to understand what would/should happen when you get into this state. Regards, Jeroen...