----------------------------------------------------------------------------
---
  Version Information:
----------------------------------------------------------------------------
---
    Red Hat Linux:   2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
           Apache:   2.0.54
           mod_jk:   1.2.20
           tomcat:   4.1.34
             java:   1.5.0_06 (build 1.5.0_06-b05)


----------------------------------------------------------------------------
---
  Problem description:
----------------------------------------------------------------------------
---

Under heavy load, after running for some time, some tomcat instances throw
OutOfMemoryErrors and then become unresponsive.  

Using jmap -heap just after such an error occurs reports plenty of available
memory.  Watching the heap before the tomcats get into this type of
situation  reveals no telltale growth in memory usage.

We currently have Xmx set to 256M, we have tried increasing and decreasing
this value and there is no change in the behavior.

Questions: 
        Is it possible that the initial OutOfMemory error occurs when a
large 
        garbage collection is taking place and an OutOfMemoryError is thrown

        before the memory can be reclaimed throwing the whole system into
some
        sort of bad state? If so, how do we combat this behavior?
        
        Is one of the tomcat threads running out of memory, and killing
itself,
        thus freeing the memory? If this is what is happening, does anyone
have 
        any advice on catching the tomcat memory usage just prior to going
into 
        a bad state?


Based on other's reports of similar problems we have investigated a number
of system resources and their limits (file descriptors, threads, etc) with
no luck.  (below are some statistics that seem to show that these other
areas are not the  problem).

One perhaps telling piece of information is that once a tomcat has gotten
into this state, we find that many connections to apache end up in the
SYN_RECV state  (as reported by netstat).  It appears that tomcat's listener
thread is still  accepting connections, but something goes wrong in the
handoff to the processor threads such that the connection is left in
SYN_RECV.  This is curious as a  stack trace of tomcat's threads report many
(20+) processor threads in await() waiting for the next thing to process.

I have included as much relevant information as I can think of below, but am
happy to  provide additional information should anyone have any ideas as to
what may be  going wrong.


We would be very thankful to hear from anyone who may have any experience of
similar problems, or guidance with what to try as next steps.


----------------------------------------------------------------------------
---
  Version Information:
----------------------------------------------------------------------------
---
    Red Hat Linux:   2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
           Apache:   2.0.54
           mod_jk:   1.2.20
           tomcat:   4.1.34
             java:   1.5.0_06 (build 1.5.0_06-b05)

----------------------------------------------------------------------------
---
  System setup:
----------------------------------------------------------------------------
---

We are running apache to serve static content, and mod_jk with balanced
workers to forward requests for dynamic content to 5 tomcat instances.  The
following  are the relevant settings:
 

  Apache settings (httpd.conf):

        Timeout 300
        KeepAlive On
        MaxKeepAliveRequests 100

       <IfModule prefork.c>
           StartServers         5
           MinSpareServers      5
           MaxSpareServers     10
           MaxClients         200
           MaxRequestsPerChild  100
        </IfModule>


 
  mod_jk settings (worker.properties)

        worker.list=loadbalancer

        worker.tomcat1.port=11009
        worker.tomcat1.host=localhost
        worker.tomcat1.type=ajp13
        worker.tomcat1.socket_timeout=30
        worker.tomcat1.reply_timeout=60000
        worker.tomcat1.connection_pool_size=1
        worker.tomcat1.connection_pool_minsize=1
        worker.tomcat1.connection_pool_timeout=300
        worker.tomcat1.lbfactor=1
        worker.tomcat1.recover_time=600

        ### (same settings for tomcat2 - tomcat5) ###

        worker.loadbalancer.type=lb
        worker.loadbalancer.balance_workers=  tomcat1, tomcat2, tomcat3,
tomcat4, tomcat5


   tomcat settings (server.xml)

        <Connector className="org.apache.ajp.tomcat4.Ajp13Connector"
               port="11009" minProcessors="5" maxProcessors="100"
               acceptCount="10" debug="0" enableLookups="false"/>




After running for some time (anywhere from 20 minutes to 12 hours depending
on  load), we see one instance of tomcat stop responding.  The following are
the  errors reported in vairous logs.

----------------------------------------------------------------------------
---
  catalina.out error messages (stderr/stdout from catalina.sh):
----------------------------------------------------------------------------
---

   2007-02-07 12:43:22 Ajp13Processor[15009][3] 
        process: invoke java.lang.OutOfMemoryError: Java heap space

----------------------------------------------------------------------------
---
  Catalina.log error messages (log4j logging):
----------------------------------------------------------------------------
---

   02/07 12:52:37 56012903 ERROR [Ajp13Processor[15009][11]] 
        Got a throwable in XTAPIClient.java: java.lang.OutOfMemoryError: 
        Java heap space

----------------------------------------------------------------------------
---
  mod_jk.log error messages:
----------------------------------------------------------------------------
---
   [Wed Feb 07 12:57:38 2007] [5368:49280] 
        [info] jk_open_socket::jk_connect.c (451): connect to
127.0.0.1:15009 
                failed with errno=115

   [Wed Feb 07 12:57:38 2007] [5368:49280] 
        [info]  ajp_connect_to_endpoint::jk_ajp_common.c (873): Failed
opening 
                socket to (127.0.0.1:15009) with (errno=115)

   [Wed Feb 07 12:57:38 2007] [5368:49280] 
        [info]  ajp_send_request::jk_ajp_common.c (1259): (tomcat5) error
                connecting to the backend server (errno=115)

   [Wed Feb 07 12:57:38 2007] [5368:49280] 
        [info]  ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
                to tomcat failed,  recoverable operation attempt=1

   [Wed Feb 07 12:57:47 2007] [5420:49280] 
        [error] ajp_get_reply::jk_ajp_common.c (1545): (tomcat5) Timeout
with
                waiting reply from tomcat. Tomcat is down, stopped or
network
                problems.

   [Wed Feb 07 12:57:47 2007] [5420:49280] 
        [info]  ajp_service::jk_ajp_common.c (1877): (tomcat5) receiving
from
                tomcat failed, recoverable operation attempt=0

   [Wed Feb 07 12:57:47 2007] [5420:49280] 
        [info]  ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
                to tomcat failed,  recoverable operation attempt=1



After some time (~24 min in this case) errno changes from 115(EINPROGRESS)
to 111 (ECONNREFUSED)



The following are various statistics gathered for a tomcat in such a bad
state:

----------------------------------------------------------------------------
---
  jmap -heap report
----------------------------------------------------------------------------
---

        Heap Configuration:
           MinHeapFreeRatio = 40
           MaxHeapFreeRatio = 70
           MaxHeapSize      = 268435456 (256.0MB)
           NewSize          = 655360 (0.625MB)
           MaxNewSize       = 4294901760 (4095.9375MB)
           OldSize          = 1441792 (1.375MB)
           NewRatio         = 8
           SurvivorRatio    = 8
           PermSize         = 16777216 (16.0MB)
           MaxPermSize      = 67108864 (64.0MB)

        Heap Usage:
        PS Young Generation
        Eden Space:
           capacity = 10223616 (9.75MB)
           used     = 4748760 (4.528770446777344MB)
           free     = 5474856 (5.221229553222656MB)
           46.44892765925481% used
        From Space:
           capacity = 524288 (0.5MB)
           used     = 483360 (0.460968017578125MB)
           free     = 40928 (0.039031982421875MB)
           92.193603515625% used
        To Space:
           capacity = 9109504 (8.6875MB)
           used     = 0 (0.0MB)
           free     = 9109504 (8.6875MB)
           0.0% used
        PS Old Generation
           capacity = 238616576 (227.5625MB)
           used     = 162772160 (155.23162841796875MB)
           free     = 75844416 (72.33087158203125MB)
           68.21494245227959% used
        PS Perm Generation
           capacity = 22675456 (21.625MB)
           used     = 22515408 (21.472366333007812MB)
           free     = 160048 (0.1526336669921875MB)
           99.2941795746026% used


----------------------------------------------------------------------------
---
  lsof -p <pid>
----------------------------------------------------------------------------
---

        Reports 152 files open
        45 of which are TCP connections

----------------------------------------------------------------------------
---
  netstat -a | grep 15009
----------------------------------------------------------------------------
---
        117 total connections
          1 in LISTEN 
         17 in ESTALISHED
         44 in CLOSE_WAIT
         17 in FIN_WAIT1
         38 in SYN_RECV

----------------------------------------------------------------------------
---
  threads
----------------------------------------------------------------------------
---
        ps -flem | grep tomcat5 reports 41 threads
        
        tomcat logging indicates 59 threads were started 
        (2007-02-07 12:51:12 Ajp13Processor[15009][59] Starting background
thread)

        jstack has stack information for 30 threads


----------------------------------------------------------------------------
---
  jstack
----------------------------------------------------------------------------
---

   25 threads have a stack trace like this:

        Thread 5266: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0 
         - java.lang.Object.wait() @bci=2, line=474 
         - org.apache.ajp.tomcat4.Ajp13Processor.await() @bci=8, line=247 
         - org.apache.ajp.tomcat4.Ajp13Processor.run() @bci=27, line=519 
         - java.lang.Thread.run() @bci=11, line=595 


    These stack traces appear for one thread each:

        Thread 18081: (state = IN_NATIVE)
         - java.net.SocketInputStream.socketRead0(...)  @bci=0 
         - java.net.SocketInputStream.read(...) @bci=84, line=129 
         - org.gjt.mm.mysql.MysqlIO.readFully(...) @bci=31, line=1114
         - org.gjt.mm.mysql.MysqlIO.readPacket() @bci=80, line=891 
         - org.gjt.mm.mysql.MysqlIO.clearAllReceive() @bci=14, line=990 
         - org.gjt.mm.mysql.Connection.execSQL(...) @bci=291, line=882 
         - org.gjt.mm.mysql.Connection.execSQL(...) @bci=5, line=815 
         - org.apache.commons.dbcp.DelegatingConnection.isClosed() @bci=11,
line=283 
         - com.Xtensible.AsyncWeb.JDBCWrapper.doQuery(...) @bci=147,
line=617
        
        Thread 17937: (state = BLOCKED)


        Thread 17936: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0 
         - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 
         - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 
        Error occurred during stack walking:


        Thread 17935: (state = BLOCKED)
         - java.lang.Object.wait(long) @bci=0 
         - java.lang.Object.wait() @bci=2, line=474 
        Error occurred during stack walking:


        Thread 17929: (state = IN_NATIVE)
         - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0

         - java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7,
line=384 
         - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50,
line=450 
         - java.net.ServerSocket.accept() @bci=48, line=421 
         - org.apache.catalina.core.StandardServer.await() @bci=74, line=517

         - org.apache.catalina.startup.Catalina.start() @bci=436, line=467 
         - org.apache.catalina.startup.Catalina.execute() @bci=8, line=345 
         - org.apache.catalina.startup.Catalina.process(java.lang.String[])
@bci=17, line=129       
         - sun.reflect.NativeMethodAccessorImpl.invoke0(...) @bci=0
         - sun.reflect.NativeMethodAccessorImpl.invoke(...) @bci=87, line=39

         - sun.reflect.DelegatingMethodAccessorImpl.invoke(...) @bci=6,
line=25
         - java.lang.reflect.Method.invoke(...) @bci=111, line=585      
         - org.apache.catalina.startup.Bootstrap.main(...) @bci=564,
line=150 


Again, any help or information would be greatly appreciated.

Sharon French
Xtensible Technology Inc
195 McGregor St.
Suite 322
Manchester, NH 03102
603.641.8141 x202
http://www.mlspulse.com

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to