Thanks for sharing. It may have been 'back in the 8.0 days', but is
nonetheless a clever process.
Streams and Logminer may be available now, but I like the elegance
of the dual partition exchange.
Jared
| "Tanel Poder" <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED] 08/22/2003 09:34 AM
|
To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]> cc: Subject: RE: (long) Design question, historic and views |
Hi!
To answer your original question about the design & DW transport, there is
too much to write to answer it completely. There's too many different ways
to do the task.
I'll try to give you a reply from my past experience with OLTP-> DW transfer
(from up to 800GB OLTP systems to 2-3TB DWs).
1) Let say we have a table EMP which we want to replicate to DW.
2) EMP has a monotonically increasing timestamp/sequence column for being
able to put versions in order or enforce optimistic locking.
3) There is a trigger on EMP which duplicates rows to EMP2 table, based on
our rules (insert/update/delete in our case)
4) EMP2 table is range partititioned table on timestamp column, with single
partition.
5) When we decide to transfer changes to DW, we split the EMP2 table to 2
partitions, one partition with all current rows in EMP2 table, second
partitions for all values from max(timestamp in EMP2)+1 .. MAXVALUE
6) We do exchange partitions with table EMP3 (all the rows in EMP2 first
partition go to EMP3 table)
7) Now we can safely transport the changes to DW staging area without
interrupting triggered inserts to EMP2 and without having to worry about
whether any new rows were inserted into EMP2 meanwhile.
8) We drop partition 1 of EMP2 table, generating practically no redo, and
leaving only the records inserted after split partition into EMP2.
9) And we start all over from step 5 again if want to transport next set of
changes.
Btw, if you write your trigger accordingly, you can just update the master
table when new version arrives and let the trigger handle copying old
version to EMP2 - no deletes are required. It could even be possible to
write trigger to update only those columns in row which actually have
changed, to reduce rollback and redo amount, but this will probably be
harder on your CPU. Anyway if you do so, and your trigger gets fairly large,
it might be reasonable to put the code in a package, pin it and call the
package from trigger. It's matter of benchmarking.
So, I just described a solution we used to you - this was back in 8.0 days,
today there's a lot of other solutions like logminer/streams for example.
Ok, that much from transporting.
I don't quite get where do you want to place the views and what is their
purpose? In your ODS? Or DW?
Were you asking for a mean to distinguish between current and old versions?
If in ODS you have your current and old version tables separate (EMP vs.
EMP2) then there's no problems - all current versions are in EMP table. But
in DW where all records are together you have two options (which first come
into my mind):
1) Modify ETL process to update some column of future old record to set
current=N when new record comes in. This means that you have to search &
update old current version of a record every time you insert a new version.
2) Do not modify ETL process at all, use timestamp column instead
(timestamp/sqn is monotonically increasing column), so whichever record has
larger sequence# is the current one. There are buts as well, for example if
you want to keep deleted versions also in your DW, then you could update
timestamp to 0 or similar. Also, depending on average number of versions,
this might get quite slow if you aren't able to use indexes properly (should
use ascending index range scan instead of sorting with large number of
versions).
I hope it was what you were asking about.
This was my... erm... 3 cents (sync, sync, sync ;)
Tanel.
