On 10/05/2010 18:58, Schoenemann, Ryan wrote:
> I have 2 Tomcat 6.0.16 servers set up in a cluster running on a Windows
> 2003 VM as a windows service, with java version 1.6.0_10.  After 10 - 14
> days of running one of the Tomcat instances will start using 100% of the
> server CPU.  

Can you upgrade to the latest version? 6.0.16 is getting on a bit...


p

> Through JConsole I see that the NIOReciever thread is the top CPU using
> thread, where it is usually at the bottom with next to none CPU usage.
> When I restart the Tomcat6 windows service everything goes back to
> normal, but a couple of days later the other server in the cluster will
> need to be restarted.  I searched for similar occurrences but I was only
> able to find a problem with the NIO selector while running on Linux, and
> it was supposed to be fixed in a previous build of 1.6.  
> 
>  
> 
> I used the cluster setup from the tomcat manual, with the exception of
> using synchronous replication.  
> 
>  
> 
> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="4">
> 
>             <Manager
> className="org.apache.catalina.ha.session.DeltaManager"
> 
>                         expireSessionsOnShutdown="false"
> notifyListenersOnReplication="true" />
> 
>             <Channel
> className="org.apache.catalina.tribes.group.GroupChannel">
> 
>                         <Membership
> className="org.apache.catalina.tribes.membership.McastService"
> 
>                                     address="228.0.0.4" port="45564"
> frequency="500" dropTime="3000" />
> 
>                         <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
> 
>                                     address="auto" port="4000"
> autoBind="100" selectorTimeout="5000"
> 
>                                     maxThreads="6" />
> 
>                         <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
> 
>                                     <Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender
> " />
> 
>                         </Sender>
> 
>                         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetec
> tor" />
> 
>                         <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.MessageDispatch
> 15Interceptor" />
> 
>             </Channel>
> 
>             <Valve
> className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" />
> 
>             <Valve
> className="org.apache.catalina.ha.session.JvmRouteBinderValve" />
> 
>             <ClusterListener
> className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListene
> r" />
> 
>             <ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener" />
> 
> </Cluster>
> 
>  
> 
> I took a thread dump during the most recent occurrence:
> 
>  
> 
> [2010-05-04 07:49:40] [info] "NioReceiver" 
> 
> [2010-05-04 07:49:40] [info] daemon 
> 
> [2010-05-04 07:49:40] [info] prio=6 tid=0x54f9b400 
> 
> [2010-05-04 07:49:40] [info] nid=0x2e8 
> 
> [2010-05-04 07:49:40] [info] runnable 
> 
> [2010-05-04 07:49:40] [info] [0x5522f000..0x5522fa18]
> 
> [2010-05-04 07:49:40] [info]    java.lang.Thread.State: RUNNABLE
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source)
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source)
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
> 
> [2010-05-04 07:49:40] [info]         - locked <0x07563448> 
> 
> [2010-05-04 07:49:40] [info] (a sun.nio.ch.Util$1)
> 
> [2010-05-04 07:49:40] [info]         - locked <0x07563458> 
> 
> [2010-05-04 07:49:40] [info] (a java.util.Collections$UnmodifiableSet)
> 
> [2010-05-04 07:49:40] [info]         - locked <0x075633d0> 
> 
> [2010-05-04 07:49:40] [info] (a sun.nio.ch.WindowsSelectorImpl)
> 
> [2010-05-04 07:49:40] [info]         at
> sun.nio.ch.SelectorImpl.select(Unknown Source)
> 
> [2010-05-04 07:49:40] [info]         at
> org.apache.catalina.tribes.transport.nio.NioReceiver.listen(NioReceiver.
> java:243)
> 
> [2010-05-04 07:49:40] [info]         at
> org.apache.catalina.tribes.transport.nio.NioReceiver.run(NioReceiver.jav
> a:353)
> 
> [2010-05-04 07:49:40] [info]         at java.lang.Thread.run(Unknown
> Source)
> 
>  
> 
>  
> 
> The only other thing I have noticed is that every evening around the
> same time I see the following messages posted in the catalina log for 5
> - 30 minutes:
> 
>  
> 
> Apr 28, 2010 6:47:16 PM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
> 
> INFO: Received
> memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/
> /{10, -116, 111, 42}:4000,{10, -116, 111, 42},4000,
> alive=155973672,id={78 -71 -19 48 57 82 65 122 -80 52 -24 28 -126 95 77
> 27 }, payload={}, command={}, domain={}, ]] message. Will verify.
> 
>  
> 
> Apr 28, 2010 6:47:16 PM
> org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
> 
> WARNING: Channel key is registered, but has had no interest ops for the
> last 3000 ms. (cancelled:false):sun.nio.ch.selectionkeyi...@a3ae07 last
> access:2010-04-28 18:47:10.283
> 
>  
> 
>  
> 
> And this is the last message I see every day:
> 
>  
> 
> Apr 28, 2010 6:47:29 PM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
> 
> INFO: Verification complete. Member still
> alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, -116,
> 111, 42}:4000,{10, -116, 111, 42},4000, alive=156009672,id={78 -71 -19
> 48 57 82 65 122 -80 52 -24 28 -126 95 77 27 }, payload={}, command={},
> domain={}, ]]
> 
>  
> 
>  
> 
> I'm trying to track down what in our environment is causing the two
> instances not to be able to communicate, and I'm not sure if this is
> what causes the NIOReciever to use all the CPU.  
> 
>  
> 
> Any help identifying what is causing the CPU usage increase would be
> appreciated.  
> 
>  
> 
>  
> 
> Thanks,
> 
>  
> 
> Ryan
> 
>  
> 
>  
> 
>  
> 
> 
> 
> *****************************************************************************
> If you wish to communicate securely with Commerce Bank and its
> affiliates, you must log into your account under Online Services at 
> http://www.commercebank.com or use the Commerce Bank Secure
> Email Message Center at https://securemail.commercebank.com
> 
> NOTICE: This electronic mail message and any attached files are
> confidential. The information is exclusively for the use of the
> individual or entity intended as the recipient. If you are not
> the intended recipient, any use, copying, printing, reviewing,
> retention, disclosure, distribution or forwarding of the message
> or any attached file is not authorized and is strictly prohibited.
> If you have received this electronic mail message in error, please
> advise the sender by reply electronic mail immediately and
> permanently delete the original transmission, any attachments
> and any copies of this message from your computer system.
> *****************************************************************************


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to