It helped us to upgrade to java 1.5.0_10. There are fixes for memory leaks in 
native memory. They do not show up in the java heap.

Ronald.

On Wed Feb 07 23:34:45 CET 2007 Tomcat Users List <users@tomcat.apache.org> 
wrote:
----------------------------------------------------------------------------
---
Version Information:
----------------------------------------------------------------------------
---
Red Hat Linux: 2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
Apache: 2.0.54
mod_jk: 1.2.20
tomcat: 4.1.34
java: 1.5.0_06 (build 1.5.0_06-b05)


----------------------------------------------------------------------------
---
Problem description:
----------------------------------------------------------------------------
---

Under heavy load, after running for some time, some tomcat instances throw
OutOfMemoryErrors and then become unresponsive.
Using jmap -heap just after such an error occurs reports plenty of available
memory. Watching the heap before the tomcats get into this type of
situation reveals no telltale growth in memory usage.

We currently have Xmx set to 256M, we have tried increasing and decreasing
this value and there is no change in the behavior.

Questions: Is it possible that the initial OutOfMemory error occurs when a large garbage collection is taking place and an OutOfMemoryError is thrown

before the memory can be reclaimed throwing the whole system into
some
sort of bad state? If so, how do we combat this behavior?

Is one of the tomcat threads running out of memory, and killing
itself,
thus freeing the memory? If this is what is happening, does anyone
have any advice on catching the tomcat memory usage just prior to going into a bad state?


Based on other's reports of similar problems we have investigated a number
of system resources and their limits (file descriptors, threads, etc) with
no luck. (below are some statistics that seem to show that these other
areas are not the problem).

One perhaps telling piece of information is that once a tomcat has gotten
into this state, we find that many connections to apache end up in the
SYN_RECV state (as reported by netstat). It appears that tomcat's listener
thread is still accepting connections, but something goes wrong in the
handoff to the processor threads such that the connection is left in
SYN_RECV. This is curious as a stack trace of tomcat's threads report many
(20+) processor threads in await() waiting for the next thing to process.

I have included as much relevant information as I can think of below, but am
happy to provide additional information should anyone have any ideas as to
what may be going wrong.


We would be very thankful to hear from anyone who may have any experience of
similar problems, or guidance with what to try as next steps.


----------------------------------------------------------------------------
---
Version Information:
----------------------------------------------------------------------------
---
Red Hat Linux: 2.4.21-4.ELsmp #1 SMP i686 i686 i386 GNU/Linux
Apache: 2.0.54
mod_jk: 1.2.20
tomcat: 4.1.34
java: 1.5.0_06 (build 1.5.0_06-b05)

----------------------------------------------------------------------------
---
System setup:
----------------------------------------------------------------------------
---

We are running apache to serve static content, and mod_jk with balanced
workers to forward requests for dynamic content to 5 tomcat instances. The
following are the relevant settings:


Apache settings (httpd.conf):

Timeout 300
KeepAlive On
MaxKeepAliveRequests 100

<IfModule prefork.c>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 200
MaxRequestsPerChild 100
</IfModule>



mod_jk settings (worker.properties)

worker.list=loadbalancer

worker.tomcat1.port=11009
worker.tomcat1.host=localhost
worker.tomcat1.type=ajp13
worker.tomcat1.socket_timeout=30
worker.tomcat1.reply_timeout=60000
worker.tomcat1.connection_pool_size=1
worker.tomcat1.connection_pool_minsize=1
worker.tomcat1.connection_pool_timeout=300
worker.tomcat1.lbfactor=1
worker.tomcat1.recover_time=600

### (same settings for tomcat2 - tomcat5) ###

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers= tomcat1, tomcat2, tomcat3,
tomcat4, tomcat5


tomcat settings (server.xml)

<Connector className="org.apache.ajp.tomcat4.Ajp13Connector"
port="11009" minProcessors="5" maxProcessors="100"
acceptCount="10" debug="0" enableLookups="false"/>




After running for some time (anywhere from 20 minutes to 12 hours depending
on load), we see one instance of tomcat stop responding. The following are
the errors reported in vairous logs.

----------------------------------------------------------------------------
---
catalina.out error messages (stderr/stdout from catalina.sh):
----------------------------------------------------------------------------
---

2007-02-07 12:43:22 Ajp13Processor[15009][3] process: invoke java.lang.OutOfMemoryError: Java heap space

----------------------------------------------------------------------------
---
Catalina.log error messages (log4j logging):
----------------------------------------------------------------------------
---

02/07 12:52:37 56012903 ERROR [Ajp13Processor[15009][11]] Got a throwable in XTAPIClient.java: java.lang.OutOfMemoryError: Java heap space

----------------------------------------------------------------------------
---
mod_jk.log error messages:
----------------------------------------------------------------------------
---
[Wed Feb 07 12:57:38 2007] [5368:49280] [info] jk_open_socket::jk_connect.c (451): connect to 127.0.0.1:15009 failed with errno=115

[Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_connect_to_endpoint::jk_ajp_common.c (873): Failed opening socket to (127.0.0.1:15009) with (errno=115)

[Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_send_request::jk_ajp_common.c (1259): (tomcat5) error
connecting to the backend server (errno=115)

[Wed Feb 07 12:57:38 2007] [5368:49280] [info] ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
to tomcat failed, recoverable operation attempt=1

[Wed Feb 07 12:57:47 2007] [5420:49280] [error] ajp_get_reply::jk_ajp_common.c (1545): (tomcat5) Timeout
with
waiting reply from tomcat. Tomcat is down, stopped or
network
problems.

[Wed Feb 07 12:57:47 2007] [5420:49280] [info] ajp_service::jk_ajp_common.c (1877): (tomcat5) receiving
from
tomcat failed, recoverable operation attempt=0

[Wed Feb 07 12:57:47 2007] [5420:49280] [info] ajp_service::jk_ajp_common.c (1916): (tomcat5) sending
request
to tomcat failed, recoverable operation attempt=1



After some time (~24 min in this case) errno changes from 115(EINPROGRESS)
to 111 (ECONNREFUSED)



The following are various statistics gathered for a tomcat in such a bad
state:

----------------------------------------------------------------------------
---
jmap -heap report
----------------------------------------------------------------------------
---

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 268435456 (256.0MB)
NewSize = 655360 (0.625MB)
MaxNewSize = 4294901760 (4095.9375MB)
OldSize = 1441792 (1.375MB)
NewRatio = 8
SurvivorRatio = 8
PermSize = 16777216 (16.0MB)
MaxPermSize = 67108864 (64.0MB)

Heap Usage:
PS Young Generation
Eden Space:
capacity = 10223616 (9.75MB)
used = 4748760 (4.528770446777344MB)
free = 5474856 (5.221229553222656MB)
46.44892765925481% used
From Space:
capacity = 524288 (0.5MB)
used = 483360 (0.460968017578125MB)
free = 40928 (0.039031982421875MB)
92.193603515625% used
To Space:
capacity = 9109504 (8.6875MB)
used = 0 (0.0MB)
free = 9109504 (8.6875MB)
0.0% used
PS Old Generation
capacity = 238616576 (227.5625MB)
used = 162772160 (155.23162841796875MB)
free = 75844416 (72.33087158203125MB)
68.21494245227959% used
PS Perm Generation
capacity = 22675456 (21.625MB)
used = 22515408 (21.472366333007812MB)
free = 160048 (0.1526336669921875MB)
99.2941795746026% used


----------------------------------------------------------------------------
---
lsof -p <pid>
----------------------------------------------------------------------------
---

Reports 152 files open
45 of which are TCP connections

----------------------------------------------------------------------------
---
netstat -a | grep 15009
----------------------------------------------------------------------------
---
117 total connections
1 in LISTEN 17 in ESTALISHED
44 in CLOSE_WAIT
17 in FIN_WAIT1
38 in SYN_RECV

----------------------------------------------------------------------------
---
threads
----------------------------------------------------------------------------
---
ps -flem | grep tomcat5 reports 41 threads

tomcat logging indicates 59 threads were started (2007-02-07 12:51:12 Ajp13Processor[15009][59] Starting background
thread)

jstack has stack information for 30 threads


----------------------------------------------------------------------------
---
jstack
----------------------------------------------------------------------------
---

25 threads have a stack trace like this:

Thread 5266: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 - java.lang.Object.wait() @bci=2, line=474 - org.apache.ajp.tomcat4.Ajp13Processor.await() @bci=8, line=247 - org.apache.ajp.tomcat4.Ajp13Processor.run() @bci=27, line=519 - java.lang.Thread.run() @bci=11, line=595

These stack traces appear for one thread each:

Thread 18081: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(...) @bci=0 - java.net.SocketInputStream.read(...) @bci=84, line=129 - org.gjt.mm.mysql.MysqlIO.readFully(...) @bci=31, line=1114 - org.gjt.mm.mysql.MysqlIO.readPacket() @bci=80, line=891 - org.gjt.mm.mysql.MysqlIO.clearAllReceive() @bci=14, line=990 - org.gjt.mm.mysql.Connection.execSQL(...) @bci=291, line=882 - org.gjt.mm.mysql.Connection.execSQL(...) @bci=5, line=815 - org.apache.commons.dbcp.DelegatingConnection.isClosed() @bci=11, line=283 - com.Xtensible.AsyncWeb.JDBCWrapper.doQuery(...) @bci=147,
line=617

Thread 17937: (state = BLOCKED)


Thread 17936: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=116 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159 Error occurred during stack walking:


Thread 17935: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 - java.lang.Object.wait() @bci=2, line=474 Error occurred during stack walking:


Thread 17929: (state = IN_NATIVE)
- java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0

- java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7,
line=384 - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 - java.net.ServerSocket.accept() @bci=48, line=421 - org.apache.catalina.core.StandardServer.await() @bci=74, line=517

- org.apache.catalina.startup.Catalina.start() @bci=436, line=467 - org.apache.catalina.startup.Catalina.execute() @bci=8, line=345 - org.apache.catalina.startup.Catalina.process(java.lang.String[]) @bci=17, line=129 - sun.reflect.NativeMethodAccessorImpl.invoke0(...) @bci=0
- sun.reflect.NativeMethodAccessorImpl.invoke(...) @bci=87, line=39

- sun.reflect.DelegatingMethodAccessorImpl.invoke(...) @bci=6,
line=25
- java.lang.reflect.Method.invoke(...) @bci=111, line=585 - org.apache.catalina.startup.Bootstrap.main(...) @bci=564, line=150

Again, any help or information would be greatly appreciated.

Sharon French
Xtensible Technology Inc
195 McGregor St.
Suite 322
Manchester, NH 03102
603.641.8141 x202
http://www.mlspulse.com

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to