Hi - Based on your suggestion which I do appreciate we added timeouts for connect, pre_post and connection_pool so now the workers looks like:
worker.list=loadbalancer worker.loadbalancer.type=lb worker.loadbalancer.balance_workers=cbap1,cbap2 worker.loadbalancer.sticky_session=1 worker.cbap1.port=8690 worker.cbap1.host= edited for privacy worker.cbap1.type=ajp13 worker.cbap1.lbfactor=10 worker.cbap1.socket_keepalive=1 # worker.cbap1.cachesize=5 worker.cbap1.connect_timeout=10000 worker.cbap1.prepost_timeout=5000 worker.cbap1.connection_pool_timeout=7000 This has helped me with another issue very well that we didn't pay attention to before which was connections. Our connections were going way too high and not recovering after a Sun Cluster Patch we did. Now we see a nice steady set of connections which seems much better. BUT... we had another issue again today where 1 of my cluster members went into an error state but the whole cluster seemed hung. Summary of log findings show: After running 10 days with decent activity 1 of my workers has an issue. Events happened as follows: 1 - Ap 1 becomes unstable with a OutOfMemoryError: PermGen space error - org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.OutOfMemoryError: PermGen space. We are reviewing Max sizes for that to avoid 2 - Ap 1 limps along but due to new connection settings take effect I can see Ap 1 is declared in an error state - Mon Nov 10 15:47:43 2008] [4673:0001] [info] service::jk_lb_worker.c (906): service failed, worker cbap1 is in error state 3 - In my 1 Web and 2 App server cluster the application speed is severely impacted to point of being unusable. 4 - Cannot shutdown the App1 gracefully. It has to be killed from command prompt. 5 - Cluster performance returns as now App 1 is dead. 6 - Restart App 1 7 - App2 sees that the member has joined but cannot establish a Cluster with it. So now with OSCACHE we are trying to re-establish cache of objects in memory management. Nov 10, 2008 4:05:52 PM com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListener memberJoined INFO: A new member at address 'EDITED' has joined the cluster bufferedreader ready: false Nov 10, 2008 4:05:58 PM org.jgroups.protocols.FD_SOCK run SEVERE: socket address for EDITED could not be fetched, retrying fetchEstimateSpec = 413 Nov 10, 2008 4:06:06 PM org.jgroups.protocols.FD_SOCK run SEVERE: socket address for EDITED could not be fetched, retrying Nov 10, 2008 4:06:15 PM org.jgroups.protocols.FD_SOCK run 8 - Now the cluster cannot manage memory properly . Causing the cluster to Synch on it's objects. Which is the alternative to the clustering of memory objects. THis is not desirable. 9 - Decide to stop Tomcat on Ap2. Cannot. It also has to be killed. 10 - Bring Ap2 up and then cluster rejoins normally. Running OK again. So from the event which occurred on App1 I have to go through and eventually kill both App Servers which is not ideal. Questions - What else to look for on handling of the member when it goes into Error State and Cluster is still being taken down with it? Thanks! On Wed, Oct 15, 2008 at 12:48 AM, Mladen Turk <[EMAIL PROTECTED]> wrote: > DHM wrote: > >> Hi - >> >> worker.list=loadbalancer >> worker.loadbalancer.type=lb >> worker.loadbalancer.balance_workers=cbap1,cbap2 >> worker.loadbalancer.sticky_session=1 >> worker.cbap1.port=8690 >> worker.cbap1.host= edited for privacy >> worker.cbap1.type=ajp13 >> worker.cbap1.lbfactor=10 >> worker.cbap1.socket_keepalive=1 >> # worker.cbap1.cachesize=5 >> >> >> worker.cbap2.port=8690 >> worker.cbap2.host=edited for privacy >> worker.cbap2.type=ajp13 >> worker.cbap2.lbfactor=10 >> worker.cbap2.socket_keepalive=1 >> # worker.cbap2.cachesize=5 >> >> >> >> I am looking for suggestion as to other configuration properties which >> should be added to help with error handling of the worker in error >> state. Or other points to review welcome. >> THANK YOU for any help here. >> > > You should add connect_timeout and prepost_timeout to > each of the ajp workers. Those are exactly meant to > be used with hanged Tomcats. > > Regards > -- > ^(TM) > > --------------------------------------------------------------------- > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >