On 10/05/2010 18:58, Schoenemann, Ryan wrote: > I have 2 Tomcat 6.0.16 servers set up in a cluster running on a Windows > 2003 VM as a windows service, with java version 1.6.0_10. After 10 - 14 > days of running one of the Tomcat instances will start using 100% of the > server CPU.
Can you upgrade to the latest version? 6.0.16 is getting on a bit...
p
> Through JConsole I see that the NIOReciever thread is the top CPU using
> thread, where it is usually at the bottom with next to none CPU usage.
> When I restart the Tomcat6 windows service everything goes back to
> normal, but a couple of days later the other server in the cluster will
> need to be restarted. I searched for similar occurrences but I was only
> able to find a problem with the NIO selector while running on Linux, and
> it was supposed to be fixed in a previous build of 1.6.
>
>
>
> I used the cluster setup from the tomcat manual, with the exception of
> using synchronous replication.
>
>
>
> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="4">
>
> <Manager
> className="org.apache.catalina.ha.session.DeltaManager"
>
> expireSessionsOnShutdown="false"
> notifyListenersOnReplication="true" />
>
> <Channel
> className="org.apache.catalina.tribes.group.GroupChannel">
>
> <Membership
> className="org.apache.catalina.tribes.membership.McastService"
>
> address="228.0.0.4" port="45564"
> frequency="500" dropTime="3000" />
>
> <Receiver
> className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>
> address="auto" port="4000"
> autoBind="100" selectorTimeout="5000"
>
> maxThreads="6" />
>
> <Sender
> className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
>
> <Transport
> className="org.apache.catalina.tribes.transport.nio.PooledParallelSender
> " />
>
> </Sender>
>
> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetec
> tor" />
>
> <Interceptor
> className="org.apache.catalina.tribes.group.interceptors.MessageDispatch
> 15Interceptor" />
>
> </Channel>
>
> <Valve
> className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" />
>
> <Valve
> className="org.apache.catalina.ha.session.JvmRouteBinderValve" />
>
> <ClusterListener
> className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListene
> r" />
>
> <ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener" />
>
> </Cluster>
>
>
>
> I took a thread dump during the most recent occurrence:
>
>
>
> [2010-05-04 07:49:40] [info] "NioReceiver"
>
> [2010-05-04 07:49:40] [info] daemon
>
> [2010-05-04 07:49:40] [info] prio=6 tid=0x54f9b400
>
> [2010-05-04 07:49:40] [info] nid=0x2e8
>
> [2010-05-04 07:49:40] [info] runnable
>
> [2010-05-04 07:49:40] [info] [0x5522f000..0x5522fa18]
>
> [2010-05-04 07:49:40] [info] java.lang.Thread.State: RUNNABLE
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source)
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source)
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
>
> [2010-05-04 07:49:40] [info] - locked <0x07563448>
>
> [2010-05-04 07:49:40] [info] (a sun.nio.ch.Util$1)
>
> [2010-05-04 07:49:40] [info] - locked <0x07563458>
>
> [2010-05-04 07:49:40] [info] (a java.util.Collections$UnmodifiableSet)
>
> [2010-05-04 07:49:40] [info] - locked <0x075633d0>
>
> [2010-05-04 07:49:40] [info] (a sun.nio.ch.WindowsSelectorImpl)
>
> [2010-05-04 07:49:40] [info] at
> sun.nio.ch.SelectorImpl.select(Unknown Source)
>
> [2010-05-04 07:49:40] [info] at
> org.apache.catalina.tribes.transport.nio.NioReceiver.listen(NioReceiver.
> java:243)
>
> [2010-05-04 07:49:40] [info] at
> org.apache.catalina.tribes.transport.nio.NioReceiver.run(NioReceiver.jav
> a:353)
>
> [2010-05-04 07:49:40] [info] at java.lang.Thread.run(Unknown
> Source)
>
>
>
>
>
> The only other thing I have noticed is that every evening around the
> same time I see the following messages posted in the catalina log for 5
> - 30 minutes:
>
>
>
> Apr 28, 2010 6:47:16 PM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
>
> INFO: Received
> memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:/
> /{10, -116, 111, 42}:4000,{10, -116, 111, 42},4000,
> alive=155973672,id={78 -71 -19 48 57 82 65 122 -80 52 -24 28 -126 95 77
> 27 }, payload={}, command={}, domain={}, ]] message. Will verify.
>
>
>
> Apr 28, 2010 6:47:16 PM
> org.apache.catalina.tribes.transport.nio.NioReceiver socketTimeouts
>
> WARNING: Channel key is registered, but has had no interest ops for the
> last 3000 ms. (cancelled:false):sun.nio.ch.selectionkeyi...@a3ae07 last
> access:2010-04-28 18:47:10.283
>
>
>
>
>
> And this is the last message I see every day:
>
>
>
> Apr 28, 2010 6:47:29 PM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
>
> INFO: Verification complete. Member still
> alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, -116,
> 111, 42}:4000,{10, -116, 111, 42},4000, alive=156009672,id={78 -71 -19
> 48 57 82 65 122 -80 52 -24 28 -126 95 77 27 }, payload={}, command={},
> domain={}, ]]
>
>
>
>
>
> I'm trying to track down what in our environment is causing the two
> instances not to be able to communicate, and I'm not sure if this is
> what causes the NIOReciever to use all the CPU.
>
>
>
> Any help identifying what is causing the CPU usage increase would be
> appreciated.
>
>
>
>
>
> Thanks,
>
>
>
> Ryan
>
>
>
>
>
>
>
>
>
> *****************************************************************************
> If you wish to communicate securely with Commerce Bank and its
> affiliates, you must log into your account under Online Services at
> http://www.commercebank.com or use the Commerce Bank Secure
> Email Message Center at https://securemail.commercebank.com
>
> NOTICE: This electronic mail message and any attached files are
> confidential. The information is exclusively for the use of the
> individual or entity intended as the recipient. If you are not
> the intended recipient, any use, copying, printing, reviewing,
> retention, disclosure, distribution or forwarding of the message
> or any attached file is not authorized and is strictly prohibited.
> If you have received this electronic mail message in error, please
> advise the sender by reply electronic mail immediately and
> permanently delete the original transmission, any attachments
> and any copies of this message from your computer system.
> *****************************************************************************
signature.asc
Description: OpenPGP digital signature
