On 10/27/2010 1:48 PM, Scott Carey wrote:
It is almost always significantly faster than a direct bulk load into a table.
* The temp table has no indexes, the final table usually does, bulk operations
on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk
from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can
loop until its all done, then commit -- though this can have table/index bloat
implications if doing updates.
Scott, I find this very hard to believe. If you are inserting into a
temporary table and then into the target table, you will do 2 inserts
instead of just one. What you are telling me is that it is faster for me
to drive from NYC to Washington DC by driving first to Miami and then
from Miami to DC.
2) This is what I had in mind:
mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;
count
-------
1000
(1 row)
mgogala=# select count(*) from t1;
count
-------
0
(1 row)
The table is created with "on commit obliterate rows" option which means
that there is no need to do "truncate". The "truncate" command is a
heavy artillery. Truncating a temporary table is like shooting ducks in
a duck pond, with a howitzer.
??? Test it. DELETE is slow, truncate is nearly instantaneous for normal
tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a
howitzer, in Postgres its lightweight.
Truncate has specific list of tasks to do:
1) lock the table in the exclusive mode to prevent concurrent
transactions on the table.
2) Release the file space and update the table headers.
3) Flush any buffers possibly residing in shared memory.
4) Repeat the procedures on the indexes.
Of course, in case of the normal table, all of these changes are logged,
possibly producing WAL archives. That is still much faster than delete
which depends on the number of rows that need to be deleted, but not
exactly lightweight, either. In Postgres, truncate recognizes that the
table is a temporary table so it makes a few shortcuts, which makes the
truncate faster.
1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.
Temporary tables are completely different beasts in Oracle and Postgres.
Yes, you are right, truncate of a temporary table is a big no-no in the
Oracle world, especially in the RAC environment. However, I do find "ON
COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is
the classic Tom Kyte, on the topic of truncating the temporary tables:
*http://tinyurl.com/29kph3p
"*NO. truncate is DDL. DDL is expensive. Truncation is something that
should be done very infrequently.
Now, I don't mean "turn your truncates into DELETE's" -- that would
be even worse. I mean -- avoid having
to truncate or delete every row in the first place. Use a transaction
based temporary table and upon commit, it'll empty itself."
Your loop above requires a commit after every 1000 rows. What if you require
that all rows are seen at once or not at all? What if you fail part way
through? One big transaction is often a better idea and/or required.
Especially in postgres, with no undo-log, bulk inserts in one large transaction
work out very well -- usually better than multiple smaller transactions.
I don't contest that. I also prefer to do things in one big transaction,
if possible.
--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions