Re: Sporadic errors during high load

2010-03-25 Thread ssureshceg
 this and usinging the default values)
 #JAVA_OPTS=${JAVA_OPTS}  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:+CMSIncrementalMode
 # GC tuning (tried with excluding these as well)
 #JAVA_OPTS=${JAVA_OPTS}  -Xmn2g -XX:ParallelGCThreads=8
 -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90
 -XX:MaxTenuringThreshold=31
 # JVM options
 JAVA_OPTS=${JAVA_OPTS} -Dfile.encoding=utf-8 -Djava.awt.headless=true
 
 
 Software involved:
 FreeBSD 8.0-RELEASE-p2 with diablo-jdk1.6.0 (we also tried openjdk6).
 Tomcat 6.0.26 (previously 6.0.20 with same problem). The application uses
 org.apache.commons.dbcp.BasicDataSource to connect to postgresql 8.4.2 on
 the same machine. Most part of the application uses hibernate and ehcache
 to access the database but some part use vanilla jdbc and some older parts
 still use a homebrew connection pool. We use spring for transaction
 management and autowiring of some handler/service objects.
 
 Hardware:
 16 CPU cores (Intel(R) Xeon(R) X5550  @ 2.67GHz)
 32 GB RAM
 
 
 Thanks in advance,
 Patrik Kudo
 
 
 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Sporadic-errors-during-high-load-tp27918213p28024628.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Sporadic errors during high load

2010-03-16 Thread Patrik Kudo
Hi all!

We run a fairly large web application which we're currently trying to do some 
load tests on but we're experiencing some sporadic errors which we can't find 
the cause of.

We run a load test scenario using the Proxysniffer load testing tool on a 
machine connected to the same switch as the server under load. The load test 
simulates 3100 users looping over 27 pages of varying complexity. Each loop 
takes 2175 seconds on average and the average response time per page is 0.16 
seconds. The test runs for about 5 hours and after a while, normaly around 1 
hour but sometimes as soon as after a little more than 30 minutes and sometimes 
longer, there are occasional errors. The errors always come clustered with a 
bunch on each occurance. After each occurance everything runs fine for a lenght 
of time until the next occurance.

Proxysniffer reports all errors as Network Connection aborted by Server but 
when we look at each error in detail we can see that they don't all occur at 
the same stage in the request cycle. Some occur on transmit http request, 
some on open network connection, some on wait for server response, but all 
within the same second.

On one of the tests we had a total of more than 300 requests and had only 
14 errors divided over 2 occations during the 5 hour test.

The problem is 100% reproducable with the current setup and the setups we've 
tested but the errors occur with some randomness.

The application logs show nothing unusual. The access logs show nothing 
unusual. We've included the session ids in the tomcat logs and the failing urls 
doesn't show up in the access log at all for the given session id (cookies are 
shown in the error report). 

During the test the machine is under some load, but I wouldn't call it heavy 
load. The application is quite database intensive so postgres works a lot 
harder than java/tomcat.

At first we used apache 2.2 with mod_jk to in front of tomcat and the errors 
were more numerous at that time and we got a bunch of errors in the mod_jk.log 
stating apache could not connect to tomcat. To be able to pinpoint the problem 
we've now excluded apache httpd and run only tomcat with the NIO HTTP 
connector. We also tried the vanilla HTTP connector.

We've tried to use both the default garbage collector with default settings and 
the flags -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode. 
No significant difference in times and errors with both settings.

We've been able to match some of the errors with full collections reported by 
the flags -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps but some 
errors occur where there are no full GC occuring.



I'm running out of ideas here... What am I missing? What am I doing wrong? What 
could I try?



The full JVM flags are:

# general options
JAVA_OPTS=-server -Dbuild.compiler.emacs=true
# Memory limits (we've tried both higher and lower values here)
JAVA_OPTS=${JAVA_OPTS} -XX:MaxPermSize=192m -Xmx1800m -Xms1800m
# GC logging
JAVA_OPTS=${JAVA_OPTS}  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
# GC engine (Tried with excluding this and usinging the default values)
#JAVA_OPTS=${JAVA_OPTS}  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSIncrementalMode
# GC tuning (tried with excluding these as well)
#JAVA_OPTS=${JAVA_OPTS}  -Xmn2g -XX:ParallelGCThreads=8 -XX:SurvivorRatio=8 
-XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31
# JVM options
JAVA_OPTS=${JAVA_OPTS} -Dfile.encoding=utf-8 -Djava.awt.headless=true


Software involved:
FreeBSD 8.0-RELEASE-p2 with diablo-jdk1.6.0 (we also tried openjdk6). Tomcat 
6.0.26 (previously 6.0.20 with same problem). The application uses 
org.apache.commons.dbcp.BasicDataSource to connect to postgresql 8.4.2 on the 
same machine. Most part of the application uses hibernate and ehcache to access 
the database but some part use vanilla jdbc and some older parts still use a 
homebrew connection pool. We use spring for transaction management and 
autowiring of some handler/service objects.

Hardware:
16 CPU cores (Intel(R) Xeon(R) X5550  @ 2.67GHz)
32 GB RAM


Thanks in advance,
Patrik Kudo


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Sporadic errors during high load

2010-03-16 Thread Peter Crowther
Thanks for a comprehensive statement of the problem - so many people don't
include the basics, let alone the details!

A few thoughts inline.

On 16 March 2010 13:58, Patrik Kudo k...@pingpong.net wrote:

 We run a load test scenario using the Proxysniffer load testing tool on a
 machine connected to the same switch as the server under load. The load test
 simulates 3100 users


Why this number?  What happens if you increase it - does the incidence of
the problem increase?  This might make it easier to track down.


 looping over 27 pages of varying complexity.


Again, can you force the issue by tuning which pages are requested?


 Proxysniffer reports all errors as Network Connection aborted by Server
 but when we look at each error in detail we can see that they don't all
 occur at the same stage in the request cycle. Some occur on transmit http
 request, some on open network connection, some on wait for server
 response, but all within the same second.


It'd be interesting to run (say) Wireshark and sniff the TCP connections.
In particular, that sounds like TCP RSTs coming off the server but it would
be good to verify that and to see at which points in the negotiation they
happen.


 The application logs show nothing unusual. The access logs show nothing
 unusual. We've included the session ids in the tomcat logs and the failing
 urls doesn't show up in the access log at all for the given session id
 (cookies are shown in the error report).


That's interesting; I'll leave better-qualified people to comment on what
code paths this eliminates.


 We've been able to match some of the errors with full collections reported
 by the flags -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps but
 some errors occur where there are no full GC occuring.


How long do your full GCs take?


 # Memory limits (we've tried both higher and lower values here)
 JAVA_OPTS=${JAVA_OPTS} -XX:MaxPermSize=192m -Xmx1800m -Xms1800m


That's a small part of a 32G machine, but you're seeing no out of memory
errors so it's a sign of good design and coding ;-).

FreeBSD 8.0-RELEASE-p2 with diablo-jdk1.6.0 (we also tried openjdk6).


Can you tell us *exactly* which versions of the JDKs?
http://www.freebsd.org/java/ tells me 1.6.0-7 is current - sorry, I'm not as
well up on FreeBSD Java versions as some other OSs.

- Peter