Hi Yoav S and Tomcat Users, I am back with some issues again. This time have taken some good observation and wanted all of you know such that I have some good recommendations from the experts :-).
Oracle Connection Pool We have observed that the number of connections during site outages are going beyond 1000 connections. These 1000 connections are jointly held by 12 tomcats with each tomcat having a separate pool. The min amd max connections are set to 30 and 100 respectively. The connections are not able to come back on its own sometimes and they cause the production server to stop responding after some time. On looking at the thread dumps of the respective tomcats JVM, it was observed that there were a lot of thread waiting on monitor. This we suspect is nothing but a lock all these threads are trying to acquire for closing the logical connection and have the threaddumps pointing to this too (not sure!!!). Since they are all stuck and in the wait state for longer period, it seems there might be some issue with the oracle connection pool. There is one oracle connection cleanup thread that runs too and seems that this is getting killed. Moreover we have recently upgraded from jdk1.3.1 to jdk1.4.2 and wondering if the jdbc drivers are compatible with the new JVM. The version we are using of oracle is 8.1.7.3 and classes12.zip. I don't know whether this would be right mailing list but wanted to know if there are any known bugs for classes12.zip and if the new drivers has improved and fixed all the known problems. I would appreciated if any links are pointed to such known issues with classes12.zip and their fixes in the new version. Tomcat Configuration The load balancing scheme does not seem good enough (need to prove why?). This conclusion has been reached while checking the apache access logs, which suggests and points to high load specially on tomcat 4_3, 1_1 and 3_2 whenever this burst of traffic is there. One more thing 4_3 is <hostname>_<tomcat number>, as we have 4 machines having 4 apache and 12 tomcat (Check in the history - tomcat user mail achive). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $$ snapshot of worker.properties file worker.tc_1_1.port=8701 worker.tc_1_1.host=hostname1 worker.tc_1_1.type=ajp12 worker.tc_1_1.lbfactor=0.125 worker.tc_1_2.port=8702 worker.tc_1_2.host=hostname1 worker.tc_1_2.type=ajp12 worker.tc_1_2.lbfactor=0.125 worker.tc_1_3.port=8703 worker.tc_1_3.host=hostname1 worker.tc_1_3.type=ajp12 worker.tc_1_3.lbfactor=0.125 worker.tc_2_1.port=8704 worker.tc_2_1.host=hostname2 worker.tc_2_1.type=ajp12 worker.tc_2_1.lbfactor=0.125 worker.tc_2_2.port=8705 worker.tc_2_2.host=hostname2 worker.tc_2_2.type=ajp12 worker.tc_2_2.lbfactor=0.125 worker.tc_2_3.port=8706 worker.tc_2_3.host=hostname2 worker.tc_2_3.type=ajp12 worker.tc_2_3.lbfactor=0.125 worker.tc_3_1.port=8707 worker.tc_3_1.host=hostname3 worker.tc_3_1.type=ajp12 worker.tc_3_1.lbfactor=0.125 worker.tc_3_2.port=8708 worker.tc_3_2.host=hostname3 worker.tc_3_2.type=ajp12 worker.tc_3_2.lbfactor=0.125 worker.tc_3_3.port=8709 worker.tc_3_3.host=hostname3 worker.tc_3_3.type=ajp12 worker.tc_3_3.lbfactor=0.125 worker.tc_4_1.port=8710 worker.tc_4_1.host=hostname4 worker.tc_4_1.type=ajp12 worker.tc_4_1.lbfactor=0.125 worker.tc_4_2.port=8711 worker.tc_4_2.host=hostname4 worker.tc_4_2.type=ajp12 worker.tc_4_2.lbfactor=0.125 worker.tc_4_3.port=8712 worker.tc_4_3.host=hostname4 worker.tc_4_3.type=ajp12 worker.tc_4_3.lbfactor=0.125 #------ DEFAULT LOAD BALANCER WORKER DEFINITION ---------------------- --Second tomcat list for hostname1 worker.loadbalancer.type=lb worker.loadbalancer.balanced_workers=tc_4_3,tc_4_2,tc_4_1,tc_3_3,tc_3_2, tc_3_1,tc_2_3,tc_2_2,tc_2_1,tc_1_3,tc_1_2,tc_1_1 --Second tomcat list for hostname2 worker.loadbalancer.balanced_workers=tc_1_1,tc_1_2,tc_1_3,tc_2_1,tc_2_2, tc_2_3,tc_3_1,tc_3_2,tc_3_3,tc_4_1,tc_4_2,tc_4_3 --Second tomcat list for hostname3 worker.loadbalancer.balanced_workers=tc_4_3,tc_4_2,tc_4_1,tc_3_3,tc_3_2, tc_3_1,tc_2_3,tc_2_2,tc_2_1,tc_1_3,tc_1_2,tc_1_1 --Second tomcat list for hostname4 worker.loadbalancer.balanced_workers=tc_1_1,tc_1_2,tc_1_3,tc_2_1,tc_2_2, tc_2_3,tc_3_1,tc_3_2,tc_3_3,tc_4_1,tc_4_2,tc_4_3 $$ End of snapshot ~~~~~~~~~~~~~~~~~~ Yeah ajp12 and 0.125 as load balance factor...I need to know if ajp13 is better (obviously, but still) and 0.125 Can't make sense of what this is suppose to mean. If required I can also get the load on all other tomcat during one full day. Please ask it for the same. Memory Leaks Memory leaks in the application are continuosly causing high memory usage of the production servers. One observation which was observed on during the week was that amount of memory was reducing in a day to day scenario. This was observed from the 'TOP' display. Moreover, looking at the GC logs of tomcats it was found that memory was on the rise on the production machines. This rise over the period of time had caused all the JVM to occupy 1GB of RAM on production servers each. This had infact caused the total memory free on Hostname1 to as low as 182Mb at one occasion where the total memory of the machine was 4GB. I am attaching a file that can help give you some thing to comment on - GC Logs. Thats all for now... ;-) Thanks in advance ~AC -----Original Message----- From: Shapira, Yoav [mailto:[EMAIL PROTECTED] Sent: Friday, October 24, 2003 12:55 AM To: Tomcat Users List Subject: RE: Issues | Suggestion any? Howdy, I wish your mail program would quote properly -- it's much tougher to read your message without knowing what you wrote and what you quoted ;( >Yes we are doing a -Xmx, -Xms, -XX:NewSize and not as I typed, sorry about >the confusion. We are in the process to use either of the parallel GC >algorithm with jvm 1.4.2 but dont have grounds to prove it would be better >but only theoretical (wish if you can point to some). But, we need a >parallel collector as we have 4 CPU's per machine and that in fact would >help it with some more parameters like compaction. You would never be able to find GC numbers specific to your app. You must benchmark it yourself, on your servers, to get accurate numbers. Only that will tell you which GC settings are best. At the scale you're dealing with, the characteristics of your application matter a whole lot to the GC performance. >I don't have at present the idea about this number (778m). But, I was more You should know why you're limited the heap at that unusual number. >interested to understand the output that pmap returns and am enclosing one >with this mail right now: I'm not a pmap expert, I've never found it useful in debugging performance or GC problems. I'm sure other people on the list could help more in this area. >The output marked in red, what does this actually signify ??? Excess >swapping ??? >Why is the heap size in pmap output not equal to the one under the column >size in 'TOP' ??? Top lists a lot more than just the heap, so you can't compare the output from top to just the heap in another tool's output. There was nothing marked in red in your message, so I don't know what you mean by the red question. >Anyways what I meant was, if swapping was causing some problems here and do >we require more memory or do we need to tune the application more. >Comments??? When you look at top, it tells you how much swap space is being used. You want to minimize the amount of swap space used. If you have more physical memory, let the JVM use it by increasing the -Xmx setting. >>You have your causality mixed up here. High GC does not cause a high >>number of threads. > >Yeah I think you are right that high threads don't cause the GC but the >default % heap which when filled will invoke the GC. But, more or less all >these threads account for database connections actually and to some >downloads. That's not what I said: What I said and what you say above are both right, but not the same thing. >>Why are they unable to come back? Are they not released back into the >>pool? Do they go bad but not abandoned by the pool? > >This is what I wanted to know .... but seems like the connections are held No one can answer that except you. Run your app inside a profiler and see what holds references to the connection objects. >The crash message and id was checked. This was found an active bug on the >sun's bug database. But, seems they have corrected it in jdk1.4.2. As we >have't had any kills all of a sudden yet, it may have been solved or ...?? ... or you've been lucky so far. You should have the latest OS patches for the JDK installed as a matter of proper procedure. Keep watching this, but if it's OK for now, focus more on the memory usage. >Please help me find a way out of this or a checklist of what need to be >done/checked at this stage. A checklist???? ;) ;) Amusing. I used to go out with a girl who worked at Sapient. Are you up here in Boston? Yoav Shapira This e-mail, including any attachments, is a confidential business communication, and may contain information that is confidential, proprietary and/or privileged. This e-mail is intended only for the individual(s) to whom it is addressed, and may not be saved, copied, printed, disclosed or used by anyone else. If you are not the(an) intended recipient, please immediately delete this e-mail from your computer system and notify the sender. Thank you. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
