RE: mod_jk errors and http status code 503 under load

2007-09-10 Thread Damien Corbishley
Good morning Rainer,

bash-3.00# /usr/local/apache2/bin/httpd -V
Server version: Apache/2.0.55
Server built:   Mar  5 2007 11:10:17
Server's Module Magic Number: 20020903:11
Architecture:   32-bit
Server compiled with
 -D APACHE_MPM_DIR=server/mpm/worker
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
 -D APR_USE_FCNTL_SERIALIZE
 -D APR_USE_PTHREAD_SERIALIZE
 -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D HTTPD_ROOT=/usr/local/apache2
 -D SUEXEC_BIN=/usr/local/apache2/bin/suexec
 -D DEFAULT_SCOREBOARD=logs/apache_runtime_status
 -D DEFAULT_ERRORLOG=logs/error_log
 -D AP_TYPES_CONFIG_FILE=conf/mime.types
 -D SERVER_CONFIG_FILE=conf/httpd.conf

I shall upgrade our mod_jk component and set the connection timeout
property.

I am currently using jmeter to try and re-produce the problem in our lab
before I update anything - just so I can be sure if I have solved the
issue.

May I get back to you later if I am still having issues?

Damien


-Original Message-
From: Rainer Jung [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 06, 2007 4:52 PM
To: Tomcat Users List
Subject: Re: mod_jk errors and http status code 503 under load

Damien Corbishley wrote:
 Hi Rainer,
  
 Apache:
 Server version: Apache/2.0.55
 Server built:   Mar  5 2007 11:10:17
 Server's Module Magic Number: 20020903:11
 
 Yes, we are thinking about updating the Java, I was in the process of
 checking the release notes to see if anything specific was mentioned.
 
 Mod JK:
 mod_jk/1.2.15

OK, 1.2.15 is a little antique. At the moment we reached 1.2.25.

 Worker or prefork MPM? - I don't understand the question, sorry.

MPM = Multi processing module
prefork = multiple processes, but only one thread per process
worker = multiple processes with a couple of threads per process 
(default=25)

The strings prefork or worker are also cintained in the output of httpd

-V. Example:

...
Server compiled with
  -D APACHE_MPM_DIR=server/mpm/prefork
...

 Connector port=8009
enableLookups=false redirectPort=8443
 protocol=AJP/1.3
acceptCount=50
minSpareThreads=240
maxThreads=950
backlog=50
address=127.0.0.1
debug=0
/

You should have a connectionTimeout.

 Tomcat is installed on the same machine as Apache.
 
 With respect to the number of connections to the tomcat port I can't
 give a complete answer - looking at the crash I can see there are
around
 4000 stuck at TIME_WAIT. Another 30 or so are at ESTABLISHED.

All of those 4000 TIME_WAIT for port 8009? That would be very much, 
indicating, that you are rapidly opening and closing the connections 
between Apache/mod_jk and tomcat. Those connections are supposed to be 
reused for some time.

 The mod jk configuration:

This file only consits of comments?

 ## THIS FILE MAY BE OVERRIDEN AT RUNTIME. MAKE SURE TOMCAT IS STOPED
 ## WHEN YOU EDIT THE FILE.
 
 ## COMMENTS WILL BE _LOST_
 
 ## DOCUMENTATION OF THE FORMAT IN JkMain javadoc.
 
 # Set the desired handler list
 # handler.list=apr,request,channelJni
 #
 # Override the default port for the socketChannel
 # channelSocket.port=8019
 # Default:
 # channelUnix.file=${jkHome}/work/jk2.socket
 # Just to check if the the config  is working
 # shm.file=${jkHome}/work/jk2.shm
 
 # In order to enable jni use any channelJni directive
 # channelJni.disabled = 0
 # And one of the following directives:
 
 # apr.jniModeSo=/opt/apache2/modules/mod_jk2.so
 
 # If set to inprocess the mod_jk2 will Register natives itself
 # This will enable the starting of the Tomcat from mod_jk2
 # apr.jniModeSo=inprocess
 


 bash-3.00# cat ./callflow-server/webserver/conf/workers.properties
 workers.apache_log=/usr/local/apache2

This attribute never existed (so it's ignored).

 workers.tomcat_home=/opt/tomcat
 workers.java_home=/usr/j2se/

Thos two don't have a meaning any more.

 worker.list= worker1
 
 worker.worker1.type=ajp13
 
 worker.worker1.host=127.0.0.1
 worker.worker1.port=8009
 worker.worker1.lbfactor=2.5

Not needed, if you are not using a JK load balancer. If you upgrade and 
later want to use a JK load balancer: newer JK versions use integer 
numbers here.

 # Number of concurrent users.
 worker.worker1.cachesize=950
 worker.worker1.cache_timeout=600

So set connectionTimeout in the AJP connector (server.xml) to 60 
(Milliseconds).

 worker.worker1.socket_timeout=300
 
 Yes, I agree with you about contacting SUN, I'm digging out our
service
 plan at the moment.

I don't see a good reason, why you should get those error messages for 
this configuration. But you should update to a more recent mod_jk in 
order to research this problem from a better starting point (and maybe 
it will be gone).

 Damien
 
 
 -Original Message-
 From: Rainer Jung [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 06, 2007 3:42 PM
 To: Tomcat Users List
 Subject: Re

mod_jk errors and http status code 503 under load

2007-09-06 Thread Damien Corbishley
Hi, 

 I'm getting stuck with an error we are seeing on our production server
under load. 

 I've googled and found many posts on similar topics but I haven't found
an answer yet, 

 I was hoping somebody here could shed some light on my issue:

 

We upgraded our combination of Apache,Tomcat,Java all running on Solaris
10 to 

Apache 2_2.0.55.build2

Tomcat 5.5.23

Java 1.5.0_06-b05

 

Our Apache Tomcat connector is AJP/1.3.

 

It's all installed on the same server, we have no tomcat clusters and no
load balancers.

 

With a few (10) concurrent users accessing the various JSP's everything
worked as expected, all our functional tests passed.

We opened more up live traffic to the server and starting seeing the
following error message:

 

[Thu Sep 06 02:23:00 2007] [error] ajp_service::jk_ajp_common.c (1758):
Error connecting to tomcat. Tomcat is probably not started or is
listening on the wrong port. worker=worker1 failed

 

On the client side this was being reported as a 503 and our
intelligent browser was re-submitting the request and generally, 

on the second time, the request would be successful. 

After an hour we then had a Solaris panic, and after analyzing the crash
using scat I can see that the cause was:

 

panic string:   BAD TRAP: type=9 rp=2a1015332d0 addr=0 mmu_fsr=0

 

and looking further into the trap, it seems to be network related.

 

trapip:tcp_conn_request+0x894 (0x4, 0x2000, 0x136ec00, 0x3a2b9a44040,
0x40e03960300, 0x40dedc247c0)

ip:squeue_drain+0xe4 (0x300015ddec0, 0x40e2021c1c0, 0x411666de,
0x135f2c0, 0x3a2b9a44040, 0x3)

ip:squeue_enter+0x358 (0x300015ddec0, 0x38fca5e0d80, 0x3000426b640,
0x3a2bd674c00, 0x0, 0x1)

ip:tcp_wput - frame recycled

 

I am still trying to work out if the crash is a symptom of the browsers
resubmitting the failed requests and we've hit a Solaris bug that we
haven't got a patch for,

Or if the two errors are unrelated and it was coincidence that we
panicked whilst having mod_jk errors. 

 

 

Any help/pointers in resolving the mod_jk error would be appreciated, I
included the brief crash information as it may be relevant. 

Thanks

 

Damien


This e-mail transmission may contain information that is proprietary, 
privileged and/or confidential and is intended exclusively for the person(s) to 
whom it is addressed. Any use, copying, retention or disclosure by any person 
other than the intended recipient or the intended recipient's designees is 
strictly prohibited. If you are the intended recipient, you must treat the 
information in confidence and in accordance with all laws related to the 
privacy and confidentiality of such information.  If you are not the intended 
recipient or their designee, please notify the sender immediately by return 
e-mail and delete all copies of this email, including all attachments.

Intervoice, Inc. 17811 Waterview Parkway Dallas, TX 75252 USA

Intervoice Limited, Registered in England and Wales with number 2601740, 50 
Park Road, Gatley, Cheshire, SK8 4HZ. VAT Number: 560421375

Intervoice GmbH, Hagenauer Straße 55, 65203 Wiesbaden, Sitz der Gesellschaft: 
Wiesbaden, Handelsregister: HRB 8186 (Amtsgericht Wiesbaden), Geschäftsführer: 
Richard Arnold.

RE: mod_jk errors and http status code 503 under load

2007-09-06 Thread Damien Corbishley
Hi Rainer,
 
Apache:
Server version: Apache/2.0.55
Server built:   Mar  5 2007 11:10:17
Server's Module Magic Number: 20020903:11

Yes, we are thinking about updating the Java, I was in the process of
checking the release notes to see if anything specific was mentioned.

Mod JK:
mod_jk/1.2.15

Worker or prefork MPM? - I don't understand the question, sorry.

Connector port=8009
   enableLookups=false redirectPort=8443
protocol=AJP/1.3
   acceptCount=50
   minSpareThreads=240
   maxThreads=950
   backlog=50
   address=127.0.0.1
   debug=0
   /

Tomcat is installed on the same machine as Apache.

With respect to the number of connections to the tomcat port I can't
give a complete answer - looking at the crash I can see there are around
4000 stuck at TIME_WAIT. Another 30 or so are at ESTABLISHED.

The mod jk configuration:

## THIS FILE MAY BE OVERRIDEN AT RUNTIME. MAKE SURE TOMCAT IS STOPED
## WHEN YOU EDIT THE FILE.

## COMMENTS WILL BE _LOST_

## DOCUMENTATION OF THE FORMAT IN JkMain javadoc.

# Set the desired handler list
# handler.list=apr,request,channelJni
#
# Override the default port for the socketChannel
# channelSocket.port=8019
# Default:
# channelUnix.file=${jkHome}/work/jk2.socket
# Just to check if the the config  is working
# shm.file=${jkHome}/work/jk2.shm

# In order to enable jni use any channelJni directive
# channelJni.disabled = 0
# And one of the following directives:

# apr.jniModeSo=/opt/apache2/modules/mod_jk2.so

# If set to inprocess the mod_jk2 will Register natives itself
# This will enable the starting of the Tomcat from mod_jk2
# apr.jniModeSo=inprocess


bash-3.00# cat ./callflow-server/webserver/conf/workers.properties
workers.apache_log=/usr/local/apache2
workers.tomcat_home=/opt/tomcat
workers.java_home=/usr/j2se/

worker.list= worker1

worker.worker1.type=ajp13

worker.worker1.host=127.0.0.1
worker.worker1.port=8009
worker.worker1.lbfactor=2.5

# Number of concurrent users.
worker.worker1.cachesize=950
worker.worker1.cache_timeout=600
worker.worker1.socket_timeout=300

Yes, I agree with you about contacting SUN, I'm digging out our service
plan at the moment.

Damien


-Original Message-
From: Rainer Jung [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 06, 2007 3:42 PM
To: Tomcat Users List
Subject: Re: mod_jk errors and http status code 503 under load

Damien Corbishley wrote:
 Hi,
 
 I'm getting stuck with an error we are seeing on our production
 server under load.
 
 We upgraded our combination of Apache,Tomcat,Java all running on
 Solaris 10 to
 
 Apache 2_2.0.55.build2

This doesn't sound like a real Apache httpd version. You can check the 
real version with httpd -V or when using log level info in the error 
log of httpd.

Worker or prefork MPM?

 Tomcat 5.5.23
 
 Java 1.5.0_06-b05

If you are in a general update situation, switch to 1.5.0_12 or whatever

seems to be recent and not only a couple days old.

And finally mod_jk-version?

 Our Apache Tomcat connector is AJP/1.3.

What's the configuration of the connector element in server.xml?

 It's all installed on the same server, we have no tomcat clusters and
 no load balancers.
 
 With a few (10) concurrent users accessing the various JSP's
 everything worked as expected, all our functional tests passed.
 
 We opened more up live traffic to the server and starting seeing the 
 following error message:
 
 [Thu Sep 06 02:23:00 2007] [error] ajp_service::jk_ajp_common.c
 (1758): Error connecting to tomcat. Tomcat is probably not started or
 is listening on the wrong port. worker=worker1 failed

Is Tomcat on the same machine, or on another one?

What do you get for netstat -an| grep TOMCATPORT where you replace 
TOMCATPORT with the port number your Tomcat AJP connector listens on?

How many ESTABLISHED, how many of other states (SYN_SENT, CLOSE_WAIT
etc.)?

What's you mod_jk configuration?

 On the client side this was being reported as a 503 and our 
 intelligent browser was re-submitting the request and generally,
 
 on the second time, the request would be successful.
 
 After an hour we then had a Solaris panic, and after analyzing the
 crash using scat I can see that the cause was:

Solaris panic looks very bad. I would definitely open a bug with sun. 
I'm not saying that the original problem is there, but it is very 
unusual, that the whole system panics.

 panic string:   BAD TRAP: type=9 rp=2a1015332d0 addr=0 mmu_fsr=0
 
 and looking further into the trap, it seems to be network related.
 
 trapip:tcp_conn_request+0x894 (0x4, 0x2000, 0x136ec00,
 0x3a2b9a44040, 0x40e03960300, 0x40dedc247c0)
 
 ip:squeue_drain+0xe4 (0x300015ddec0, 0x40e2021c1c0, 0x411666de, 
 0x135f2c0, 0x3a2b9a44040, 0x3)
 
 ip:squeue_enter+0x358 (0x300015ddec0, 0x38fca5e0d80, 0x3000426b640, 
 0x3a2bd674c00, 0x0, 0x1)
 
 ip:tcp_wput - frame recycled
 
 
 
 I am still trying to work out if the crash is a symptom of the
 browsers