RE: Issues | Suggestion any?

Arnab Chakravarty Wed, 05 Nov 2003 00:22:19 -0800

Hi Yoav S and Tomcat Users,

I am back with some issues again. This time have taken some good
observation and wanted all of you know such that I have some good
recommendations from the experts :-).


Oracle Connection Pool
We have observed that the number of connections during site outages are
going beyond 1000 connections. These 1000 connections are jointly held
by 12 tomcats with each tomcat having a separate pool. The min amd max
connections are set to 30 and 100 respectively. The connections are not
able to come back on its own sometimes and they cause the production
server to stop responding after some time. On looking at the thread
dumps of the respective tomcats JVM, it was observed that there were a
lot of thread waiting on monitor. This we suspect is nothing but a lock
all these threads are trying to acquire for closing the logical
connection and have the threaddumps pointing to this too (not sure!!!).
Since they are all stuck and in the wait state for longer period, it
seems there might be some issue with the oracle connection pool. There
is one oracle connection cleanup thread that runs too and seems that
this is getting killed.  Moreover we have recently upgraded from
jdk1.3.1 to jdk1.4.2 and wondering if the jdbc drivers are compatible
with the new JVM. The version we are using of oracle is 8.1.7.3 and
classes12.zip. I don't know whether this would be right mailing list but
wanted to know if there are any known bugs for classes12.zip and if the
new drivers has improved and fixed all the known problems. I would
appreciated if any links are pointed to such known issues with
classes12.zip and their fixes in the new version.

Tomcat Configuration
The load balancing scheme does not seem good enough (need to prove
why?). This conclusion has been reached while checking the apache access
logs, which suggests and points to high load specially on tomcat 4_3,
1_1 and 3_2 whenever this burst of traffic is there.

One more thing 4_3 is <hostname>_<tomcat number>, as we have 4 machines
having 4 apache and 12 tomcat (Check in the history - tomcat user mail
achive).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$$ snapshot of worker.properties file

worker.tc_1_1.port=8701
worker.tc_1_1.host=hostname1
worker.tc_1_1.type=ajp12
worker.tc_1_1.lbfactor=0.125

worker.tc_1_2.port=8702
worker.tc_1_2.host=hostname1
worker.tc_1_2.type=ajp12
worker.tc_1_2.lbfactor=0.125

worker.tc_1_3.port=8703
worker.tc_1_3.host=hostname1
worker.tc_1_3.type=ajp12
worker.tc_1_3.lbfactor=0.125

worker.tc_2_1.port=8704
worker.tc_2_1.host=hostname2
worker.tc_2_1.type=ajp12
worker.tc_2_1.lbfactor=0.125

worker.tc_2_2.port=8705
worker.tc_2_2.host=hostname2
worker.tc_2_2.type=ajp12
worker.tc_2_2.lbfactor=0.125

worker.tc_2_3.port=8706
worker.tc_2_3.host=hostname2
worker.tc_2_3.type=ajp12
worker.tc_2_3.lbfactor=0.125

worker.tc_3_1.port=8707
worker.tc_3_1.host=hostname3
worker.tc_3_1.type=ajp12
worker.tc_3_1.lbfactor=0.125

worker.tc_3_2.port=8708
worker.tc_3_2.host=hostname3
worker.tc_3_2.type=ajp12
worker.tc_3_2.lbfactor=0.125

worker.tc_3_3.port=8709
worker.tc_3_3.host=hostname3
worker.tc_3_3.type=ajp12
worker.tc_3_3.lbfactor=0.125

worker.tc_4_1.port=8710
worker.tc_4_1.host=hostname4
worker.tc_4_1.type=ajp12
worker.tc_4_1.lbfactor=0.125

worker.tc_4_2.port=8711
worker.tc_4_2.host=hostname4
worker.tc_4_2.type=ajp12
worker.tc_4_2.lbfactor=0.125

worker.tc_4_3.port=8712
worker.tc_4_3.host=hostname4
worker.tc_4_3.type=ajp12
worker.tc_4_3.lbfactor=0.125

#------ DEFAULT LOAD BALANCER WORKER DEFINITION ----------------------

--Second tomcat list for hostname1

worker.loadbalancer.type=lb
worker.loadbalancer.balanced_workers=tc_4_3,tc_4_2,tc_4_1,tc_3_3,tc_3_2,
tc_3_1,tc_2_3,tc_2_2,tc_2_1,tc_1_3,tc_1_2,tc_1_1

--Second tomcat list for hostname2

worker.loadbalancer.balanced_workers=tc_1_1,tc_1_2,tc_1_3,tc_2_1,tc_2_2,
tc_2_3,tc_3_1,tc_3_2,tc_3_3,tc_4_1,tc_4_2,tc_4_3

--Second tomcat list for hostname3

worker.loadbalancer.balanced_workers=tc_4_3,tc_4_2,tc_4_1,tc_3_3,tc_3_2,
tc_3_1,tc_2_3,tc_2_2,tc_2_1,tc_1_3,tc_1_2,tc_1_1

--Second tomcat list for hostname4

worker.loadbalancer.balanced_workers=tc_1_1,tc_1_2,tc_1_3,tc_2_1,tc_2_2,
tc_2_3,tc_3_1,tc_3_2,tc_3_3,tc_4_1,tc_4_2,tc_4_3

$$ End of snapshot
~~~~~~~~~~~~~~~~~~

Yeah ajp12 and 0.125 as load balance factor...I need to know if ajp13 is
better (obviously, but still) and 0.125 Can't make sense of what this is
suppose to mean. If required I can also get the load on all other tomcat
during one full day. Please ask it for the same.

Memory Leaks
Memory leaks in the application are continuosly causing high memory
usage of the production servers. One observation which was observed on
during the week was that amount of memory was reducing in a day to day
scenario. This was observed from the 'TOP' display. Moreover, looking at
the GC logs of tomcats it was found that memory was on the rise on the
production machines. This rise over the period of time had caused all
the JVM to occupy 1GB of RAM on production servers each. This had infact
caused the total memory free on Hostname1 to as low as 182Mb at one
occasion where the total memory of the machine was 4GB. I am attaching a
file that can help give you some thing to comment on - GC Logs.

Thats all for now... ;-)

Thanks in advance
~AC


-----Original Message-----
From: Shapira, Yoav [mailto:[EMAIL PROTECTED]
Sent: Friday, October 24, 2003 12:55 AM
To: Tomcat Users List
Subject: RE: Issues | Suggestion any?



Howdy,
I wish your mail program would quote properly -- it's much tougher to
read your message without knowing what you wrote and what you quoted ;(

>Yes we are doing a  -Xmx, -Xms, -XX:NewSize and not as I typed, sorry
about
>the confusion. We are in the process to use either of the parallel GC
>algorithm with jvm 1.4.2 but dont have grounds to prove it would be
better
>but only theoretical (wish if you can point to some). But, we need a
>parallel collector as we have 4 CPU's per machine and that in fact
would
>help it with some more parameters like compaction.

You would never be able to find GC numbers specific to your app.  You
must benchmark it yourself, on your servers, to get accurate numbers.
Only that will tell you which GC settings are best.  At the scale you're
dealing with, the characteristics of your application matter a whole lot
to the GC performance.

>I don't have at present the idea about this number (778m). But, I was
more

You should know why you're limited the heap at that unusual number.

>interested to understand the output that pmap returns and am enclosing
one
>with this mail right now:

I'm not a pmap expert, I've never found it useful in debugging
performance or GC problems.  I'm sure other people on the list could
help more in this area.

>The output marked in red, what does this actually signify ??? Excess
>swapping ???
>Why is the heap size in pmap output not equal to the one under the
column
>size in 'TOP'  ???

Top lists a lot more than just the heap, so you can't compare the output
from top to just the heap in another tool's output.

There was nothing marked in red in your message, so I don't know what
you mean by the red question.

>Anyways what I meant was, if swapping was causing some problems here
and do
>we require more memory or do we need to tune the application more.
>Comments???

When you look at top, it tells you how much swap space is being used.
You want to minimize the amount of swap space used.  If you have more
physical memory, let the JVM use it by increasing the -Xmx setting.

>>You have your causality mixed up here.  High GC does not cause a high
>>number of threads.
>
>Yeah I think you are right that high threads don't cause the GC but the
>default % heap which when filled will invoke the GC.  But, more or less
all
>these threads account for database connections actually and to some
>downloads.

That's not what I said:  What I said and what you say above are both
right, but not the same thing.

>>Why are they unable to come back?  Are they not released back into the
>>pool?  Do they go bad but not abandoned by the pool?
>
>This is what I wanted to know .... but seems like the connections are
held

No one can answer that except you.  Run your app inside a profiler and
see what holds references to the connection objects.

>The crash message and id was checked. This was found an active bug on
the
>sun's bug database. But, seems they have corrected it in jdk1.4.2. As
we
>have't had any kills all of a sudden yet, it may have been solved or
...??

... or you've been lucky so far.  You should have the latest OS patches
for the JDK installed as a matter of proper procedure.  Keep watching
this, but if it's OK for now, focus more on the memory usage.

>Please help me find a way out of this or a checklist of what need to be
>done/checked at this stage.

A checklist???? ;) ;)  Amusing.

I used to go out with a girl who worked at Sapient.  Are you up here in
Boston?

Yoav Shapira





This e-mail, including any attachments, is a confidential business
communication, and may contain information that is confidential,
proprietary and/or privileged.  This e-mail is intended only for the
individual(s) to whom it is addressed, and may not be saved, copied,
printed, disclosed or used by anyone else.  If you are not the(an)
intended recipient, please immediately delete this e-mail from your
computer system and notify the sender.  Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Issues | Suggestion any?

Reply via email to