ok, we'll try this out then. One question about the regression, would it occur if the 2 nodes are in different Solaris containers (both having different IPs) but on the same physical host?
Thanks a lot! Wong On Wed, Aug 26, 2009 at 10:39 AM, Filip Hanik - Dev Lists < devli...@hanik.com> wrote: > hi Wong, yes, that one does implement a higher level of thread safety, and > most likely would resolve your problem. > With 6.0.20, there is a regression where tomcat nodes on the same host wont > discover each other > https://issues.apache.org/bugzilla/show_bug.cgi?id=47308 > > Filip > > > On 08/25/2009 07:22 PM, CS Wong wrote: > >> A brief look through "svn log >> >> http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/ha/session/DeltaRequest.java >> " >> turns up this: >> ------------------------------------------------------------------------ >> r618823 | fhanik | 2008-02-06 07:29:56 +0800 (Wed, 06 Feb 2008) | 3 lines >> >> Remove synchronization on the DeltaRequest object, and let the object that >> manages the delta request (session/manager) to handle the locking >> properly, >> using the session lock >> There is a case with a non sticky load balancer where using synchronized >> and >> a lock (essentially two locks) can end up in a dead lock >> ------------------------------------------------------------------------ >> >> This is the only one where the commit comments seem to indicate anything >> related to my issue. Given that 6.0.14 was released on 14 Aug 2007 ( >> http://www.mail-archive.com/annou...@apache.org/msg00386.html), it may be >> applicable. >> >> Would just like to know your opinion, is it likely that this is the issue >> I'm facing? Thanks! >> >> Wong >> >> >> On Wed, Aug 26, 2009 at 8:48 AM, CS Wong<lilw...@gmail.com> wrote: >> >> >> >>> Thanks, Filip. >>> I'm running 6.0.14 right now. Would you have any idea whether any changes >>> in the code since then would have fixed something like this? I can try to >>> push for an upgrade to 6.0.20 but the app owners would probably want to >>> know >>> whether it would be fixed for sure since they have to go through a rather >>> troublesome round of testing which takes up quite a bit of time. It helps >>> that they know that the problem won't reoccur once this has been done. >>> >>> Thanks, >>> Wong >>> >>> >>> On Tue, Aug 25, 2009 at 11:35 PM, Filip Hanik - Dev Lists< >>> devli...@hanik.com> wrote: >>> >>> >>> >>>> I've taken a look at the code. >>>> The fix for this is easy, but it doesn't explain why it happens. This is >>>> a >>>> concurrency issue, but if you're not running the latest tomcat version, >>>> then >>>> it could already have been fixed. >>>> >>>> best >>>> Filip >>>> >>>> >>>> On 08/25/2009 01:55 AM, CS Wong wrote: >>>> >>>> >>>> >>>>> Hi Michael, >>>>> The logs are the bit that went haywire. The applications at this point >>>>> still >>>>> work but often, there's not enough time to troubleshoot much else. The >>>>> logs >>>>> can increase by 5-6GB in a matter of an hour or so and hence, we often >>>>> just >>>>> kill the service (normal shutdown.sh doesn't respond any more at this >>>>> point, >>>>> we have to kill -9 it) in panic and delete the logs before the entire >>>>> server >>>>> goes kaboom. This time, I managed to tail out some of the logs, for >>>>> which >>>>> I >>>>> pasted an extract (same repeating pattern of errors): >>>>> >>>>> Aug 25, 2009 11:44:02 AM org.apache.catalina.ha.session.DeltaRequest >>>>> reset >>>>> SEVERE: Unable to remove element >>>>> java.util.NoSuchElementException >>>>> at java.util.LinkedList.remove(LinkedList.java:788) >>>>> at java.util.LinkedList.removeFirst(LinkedList.java:134) >>>>> at >>>>> >>>>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:916) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:897) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:264) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:110) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:241) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:225) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:188) >>>>> at >>>>> >>>>> >>>>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:91) >>>>> at >>>>> >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>>>> at >>>>> >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> >>>>> Wong >>>>> >>>>> >>>>> >>>>> On Tue, Aug 25, 2009 at 3:36 PM, Michael Ludwig<m...@as-guides.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> CS Wong schrieb: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Periodically, I'm getting problems with my Tomcat 6 cluster (2 >>>>>>> nodes). >>>>>>> One of the nodes would just go haywire >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Could you elaborate on what "going haywire" means? >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Below, you write: >>>>>> >>>>>> [The NoSuchElementException is] the only thing that it shows. The >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> other node in the cluster is still active at this time. There's >>>>>>> nothing to do but to restart. The large amount of logs has caused >>>>>>> disk space issues more than a couple of times too. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> So is that server not active any more? Unresponsive? Hyperactive >>>>>> writing >>>>>> to the log file? Looping? >>>>>> >>>>>> and generate a ton of logs repeating the following: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Aug 25, 2009 11:44:10 AM org.apache.catalina.ha.session.DeltaRequest >>>>>>> reset >>>>>>> SEVERE: Unable to remove element >>>>>>> java.util.NoSuchElementException >>>>>>> at java.util.LinkedList.remove(LinkedList.java:788) >>>>>>> at java.util.LinkedList.removeFirst(LinkedList.java:134) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) >>>>>>> at >>>>>>> >>>>>>> >>>>>>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I only found this, which seems to have led you here: >>>>>> >>>>>> http://stackoverflow.com/questions/1326336/ >>>>>> >>>>>> Maybe it is helpful to others who know about Tomcat internals. >>>>>> >>>>>> -- >>>>>> Michael Ludwig >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >>>>>> For additional commands, e-mail: users-h...@tomcat.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >>>> For additional commands, e-mail: users-h...@tomcat.apache.org >>>> >>>> >>>> >>>> >>> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >