Listers,

Sun E10K, Solaris 2.7, 64 bit Oracle 8.1.7.2.?

I've dealt a little bit with reading deadlock graphs in trace files in the
past, and there is lots of information out there on Metalink (and
elsewhere), but I'm far from an expert on the subject. Ok, I'm quite a
novice and maybe have read and learned just enough to confuse myself ;-)
Anyway, if you get a deadlock and/or a distributed lock timeout (ORA-02049)
due to DML across a link on a remote table, is a trace file generated *when*
ITL issues are the cause?

A little background. 8 processes were inserting/updating/deleting 6 million
rows from a source DB across a DB link into a single table (215 million
rows, 34 partitions) on a remote DB. We soon found out that the table had
multiple bitmap indices (yes, that should have been known up front). One big
commit at the end of each process. Deadlock trace files with deadlock graphs
indicative of bitmap coverage issues were generated as well as the ORA-02049
error (distributed lock timeouts). The BMI's were dropped and the process
rerun overnight. Once again, the ORA-02049 error was generated for many of
the processes. The person responsible for the process noted that the PCTFREE
for each of the target table's partitions is 0 and INITRANS is 1. And there
are 8 simultaneous DML processes hitting that table. His suspicion is that
he may be encountering a deadlock due to ITL slots. Sounds very plausible
from what I have read. But, *no* deadlock trace files were generated on the
target system. In the case of the BMI coverage, deadlock trace files *were*
generated. So, I would have half expected trace files to be generated if
these were deadlocks due to ITL slot issues. I've asked that a TAR be opened
to ask this question.

DBA's confirm that there are zero transactions, batch jobs, processes, etc
running on the remote box that could account for a lock being held on the
object resulting in the distributed lock timeout. By the way, the now
undocumented parameter for distributed lock timeouts were upped to 300 some
few months ago but with the volume and a single commit at the end, locks
could be held for much longer than that. Still researching the various
causes of ORA-02049 as well. Oh yeah, one of the first questions asked was
could any of the 8 processes be trying to process the same row. This was
shot down -- a mod function on the PK value is used to for each process,
each using a different "seed" value, to avoid processing the same rows.
Tests were conducted to confirm and verify this.

In the meantime, event 20049 is being set in each session prior to doing the
remote DML. Maybe that will turn up more info -- I don't know since I've
never seen one. I guess we will see in the morning. If anyone has some
thoughts to share, I'm all ears. And why Oracle replication isn't being
used, and why Transportable Tablespaces, as well as other options aren't
being considered, is a long, and very political, story. And if some of the
info above is a bit sparse or doesn't make sense, it's because I've been
tossed into this and am not real familiar with all the details of the
process and much of the subject matter. But the folks are bound and
determined to make this process work and have asked for my thoughts.

Regards,

Larry G. Elkins
[EMAIL PROTECTED]
214.954.1781

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Larry Elkins
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to