RE: Parallel Replication of Single Table

Khedr, Waleed Wed, 15 Jan 2003 11:17:57 -0800

Larry,

Thanks for taking the time to explain this. You are really patient when it
comes to writing!
I see now how Oracle does the refresh. I also remember now seeing a post in
this list some time ago asking about the possibility  of tuning a similar
sql. I think it was you :)


The way Oracle is pushing the changes make sense since they care nothing
about the transactions themselves and only care about the final image of the
data ( I thought they'd care about the transactions).

As I see in the sql you sent, all the sql are using bind variables. This
indicates that changes are done row by row in a procedural method instead of
a single sql (update, insert) that handles all the changes in one shot.

When we talk about parallelizing it, then there are two choices:
The first one is using parallelism in the sql engine itself and firing one
sql statement that handles either all the updates or the entire deletes to
the target table.

Parallel updates is not available in releases below 9.2. Also using this
method does not give a chance to handle any other processing (if there is)
that is related to the refresh process/rows refreshed during the refresh.

The second choice is to continue doing it in a procedural way but spawning
many Oracle processes, and let the master process 
read the rows from the sql and dequeue/distribute them to the parallel job
processes. 
The parallel jobs processes will do their jobs the same way (procedural) the
single process does.


So I think the first choice (if it's available) will not be available in any
Oracle releases before 9.2.
The second method if it's available will be available for a big packaged
Oracle option like replication.

I asked a friend who handles replication and here is the thread:

<<
 -----Original Message-----
Sent:   Wednesday, January 15, 2003 10:57 AM
To:     Khedr, Waleed
Subject:        RE: Question for you

True - this meaning of parallel propagation refers to the act of pushing
repgroups concurrently to multiple destinations. There is no way that I'm
aware of to make the queues be read by multiple background processes,
thereby pushed in parallel. 

Replication was never intended for large DSS systems and it certainly has
it's limitations. ETL-type solutions may be a better road to take for DSS or
otherwise extremely large databases. 

 -----Original Message-----
Sent:   Wednesday, January 15, 2003 10:46 AM
To:     my friend
Subject:        RE: Question for you

If the parallel push is assoc with a rep group then it may be there to
handle different tables concurrently and it's not guaranteed that if a rep
group has only one table that it will be done in parallel.

Am I right? any thoughts?
>>

Regards,

Waleed

-----Original Message-----
Sent: Tuesday, January 14, 2003 8:25 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]


Waleed,

Thanks for chiming in.

Regarding refreshing multiple tables in parallel, well yes, that could be
done on a group level. Instead, though, they choose to have the scheduling
tool kick off multiple single table refreshes. They start many at the same
time, so they do in essence get parallelism of multiple tables at one time.
And this way they get paged with a specific should something fail.

Regarding the other comments, the replication logic seems to handle the
dependency you are talking about. There really isn't any "order" per se, so
the same issues you raise would apply to serial processing as well, right?
All those rows are in the MLOG$ table with a 1/1/4000 date -- in what order
did they occur?

Obviously an insert has to occur before the update to occur (with
xceptions  -- e.g. update, delete, then insert same PK row, etc). And we
don't care how many updates have occurred since we are only interested in
pushing the current image of the row.

Now, what about deletes? Let's say you see deletes and inserts in the MLOG$
table. Well, they will all have the 1/1/4000 date prior to kicking off. How
do we know which occurred first -- did we delete an existing row and then
re-insert? Or did we insert and then delete? Well, that's handled by joining
to the core table itself -- if the row exists, then you are going to push
it, the insert would have happened after the delete. If it doesn't exist,
then you know the delete happened after the insert. So the deletes are
processed first using the existence in the base table test.

With tracing turned on you can see how the replication logic handles this:

Step 1 -- get the != 'I' where it's not in the existing table -- e.g. we
know the last action was a delete, not an insert or update since the row no
longer exists:

SELECT /*+ remote_mapped(link) */ DISTINCT
  LOG$."OBJECT_ID"
FROM
 (SELECT MLOG$."OBJECT_ID" FROM
  "SCHEMA"."MLOG$_MY_PTEST"@link MLOG$ WHERE
  "SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'I')) LOG$ WHERE (LOG$."OBJECT_ID")
  NOT IN (SELECT MAS_TAB$."OBJECT_ID" FROM
  "SCHEMA"."MY_PTEST"@LINK "MAS_TAB$" WHERE
  LOG$."OBJECT_ID" = MAS_TAB$."OBJECT_ID")

Step 2 -- process those deletes:

DELETE FROM "SCHEMA"."MV_MY_PTEST" SNAP$
WHERE
 "OBJECT_ID" = :1

Step 3 -- Look for the != 'D' where it *exists* in the table. This will be
the I's and U's. And because you handled the true deletes (as in no longer
exists) in the prior step, then you know these go:

SELECT /*+ remote_mapped(link) */ CURRENT$."OWNER",
  CURRENT$."OBJECT_NAME",CURRENT$."SUBOBJECT_NAME",CURRENT$."OBJECT_ID",
  CURRENT$."DATA_OBJECT_ID",CURRENT$."OBJECT_TYPE",CURRENT$."CREATED",
  CURRENT$."LAST_DDL_TIME",CURRENT$."TIMESTAMP",CURRENT$."STATUS",
  CURRENT$."TEMPORARY",CURRENT$."GENERATED",CURRENT$."SECONDARY"
FROM
 (SELECT /*+ */ "MY_PTEST"."OWNER" "OWNER","MY_PTEST"."OBJECT_NAME"
  "OBJECT_NAME","MY_PTEST"."SUBOBJECT_NAME" "SUBOBJECT_NAME",
  "MY_PTEST"."OBJECT_ID" "OBJECT_ID","MY_PTEST"."DATA_OBJECT_ID"
  "DATA_OBJECT_ID","MY_PTEST"."OBJECT_TYPE" "OBJECT_TYPE",
  "MY_PTEST"."CREATED" "CREATED","MY_PTEST"."LAST_DDL_TIME" "LAST_DDL_TIME",
  "MY_PTEST"."TIMESTAMP" "TIMESTAMP","MY_PTEST"."STATUS" "STATUS",
  "MY_PTEST"."TEMPORARY" "TEMPORARY","MY_PTEST"."GENERATED" "GENERATED",
  "MY_PTEST"."SECONDARY" "SECONDARY" FROM  "SCHEMA"."MY_PTEST"@LINK
"MY_PTEST") CURRENT$,
  (SELECT DISTINCT MLOG$."OBJECT_ID" FROM  "SCHEMA"."MLOG$_MY_PTEST"@LINK
MLOG$ WHERE
  "SNAPTIME$$" > :1 AND ("DMLTYPE$$" != 'D')) LOG$ WHERE
CURRENT$."OBJECT_ID"  = LOG$."OBJECT_ID"

Steps 4 and 5 (intermixed) -- process the inserts/updates, if you update
before you insert, no big deal since you want the current image and the
insert will handle it and no row to update will not fail. And if you try to
insert before you update, no big deal since you will still be pushing the
current image. So this goes:

UPDATE "SCHEMA"."MV_MY_PTEST" SET "OWNER" = :1,"OBJECT_NAME" = :2,
  "SUBOBJECT_NAME" = :3,"OBJECT_ID" = :4,"DATA_OBJECT_ID"= :5,"OBJECT_TYPE"
=
   :6,"CREATED" = :7,"LAST_DDL_TIME" = :8,"TIMESTAMP" = :9,"STATUS" = :10,
  "TEMPORARY" = :11,"GENERATED" = :12,"SECONDARY" = :13
WHERE "OBJECT_ID" = :4

INSERT INTO "SCHEMA"."MV_MY_PTEST"
("OWNER","OBJECT_NAME","SUBOBJECT_NAME",
  "OBJECT_ID","DATA_OBJECT_ID","OBJECT_TYPE","CREATED","LAST_DDL_TIME",
  "TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY")
VALUES
 (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13)

Now I may have missed a few things there, but it sounds like it handles the
dependencies. And what code and action may happen that doesn't manifest
itself as SQL -- you see analyzes computing number of distinct, max, min,
etc. But you don't see clustering and density factor calculations in terms
of SQL. I need to do some more work on this, and the Advanced Replication
guide does talk about dependencies and their impact on whether or not some
things can parallelized, but I've got to dig a bit more into that. But still
focusing on basic replication. Now the fun part is doing the replication in
the *same* DB. I could get parallelism on the SELECT's, or at least the
trace file said so, but they weren't observed, nor were the
delete/updates/inserts observed working in parallel. So I still need to do
some more digging and learning there. Anyway, most of the replication I've
done has been lower volume (with custom code for high volumes). I'd like to
get away from custom code for higher volumes, if possible. You know, things
like transportable tablespaces are an opportunity, but really more
appropriate for large actions -- low volume replication could be easier. Oh
well, no more rambling.

Regards,

Larry G. Elkins
[EMAIL PROTECTED]
214.954.1781

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,
>
> Waleed

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Khedr,
> Waleed
> Sent: Tuesday, January 14, 2003 1:34 PM
> To: Multiple recipients of list ORACLE-L
> Subject: RE: Parallel Replication of Single Table
>
>
> Hi Larry,
>
> I do not know really how you want to parallelize this kind of operations!
>
> The final goal is to push the changes in site 1 to site 2 and get the data
> in both tables in sync.
>
> Changes include inserts, updates, and deletes. These changes (dml
> operations)
>  could be cascaded on the same row which means that the final image of the
> data in the table is completely dependent
> on the order and sequence of these dml operations.
>
> This is why I think that a single table refresh requires it to be a serial
> operation.
>
> But on the other hand if a single refresh was requested for many tables,
> then this could be parallelized on the job level not the table level.
>
> Does it make sense or am I missing something?
>
> Regards,
>
> Waleed
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Khedr, Waleed
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Parallel Replication of Single Table

Reply via email to