|
Oracle 8.1.5.1 Solaris 2.6 Hi there, For the last few days one of our servers
will suddenly experience ridiculously high load averages (50-65) and all
applications grind to a halt. Restarting the database solves the problem
and shutting down a specific apps servers solves the
problem. This apps server generally has
about 2-3 JDBC connections to the database.
I just started looking at it today (it's not one of my databases) and when
it recurred I started checking wait events. The top 10 wait events for about a 2 minute
period were as follows: EVENT Delta
Waits Delta Timeouts latch free 317376
185591 SQL*Net message from client 66435 0 SQL*Net message to client 66419 0 db file sequential read 21709 0 SQL*Net more data to client 18066 0 file open 1223
0 rdbms ipc message 576 170 log file sync 519 1 log file parallel write 374 0 log file sequential read 343 0 It looks like a latch contention
problem. By the time I drilled down to
specific sessions that were experiencing the "latch free" wait the
apps server had been restarted. My query
showed most waiting on the "process allocation" latch but I don't
know for sure that this was the latch that was causing the problem during the
high load period. I prepared this query to run the next time
the problem occurs in order to identify which latches are being waited on: select s.sid,s.username,s.program,s.status, se.event,se.total_waits,se.total_timeouts, se.time_waited,se.average_wait,sw.p2,v.name from v$session s, v$session_event se, v$session_wait
sw,v$latch v where s.sid=se.sid and s.sid=sw.sid and sw.p2=l.latch# and s.status='ACTIVE' AND
S.USERNAME IS NOT NULL and se.event='latch free'; Does this make sense? Any suggestions for other things
to check? Some reading suggests
that latch contention is often caused by not using bind variables and that
increasing db_block_lru_latches (currently = 2) can
alleviate the problem while the code is being fixed. But the question is why this problem
suddenly happens and the problem doesn't go away until the apps server is shut
down. The old version of the application
(which didn't use JDBC) did not have this problem. I'm theorizing that perhaps JDBC handles its
connections differently and after waiting for some period of time resends the
transaction and keeps resending it faster than Oracle can clean things up? Has anyone experienced a similar problem or
have other suggestions of where to go from here? Thanks, Jay Miller |
