[PERFORM] Big question on insert performance/using COPY FROM

Morgan Kita Wed, 31 Aug 2005 12:20:38 -0700

Hi,

I am currently trying to speed up the insertion of bulk loads to my database. I 
have fiddled with all of the parameters that I have seen suggested(aka 
checkpoint_segments, checkpoint_timeout, maintinence_work_mem, and shared 
buffers) with no success. I even turned off fysnc with no effect so I am pretty 
sure the biggest problem is that the DB is CPU limited at the moment because of 
the rather weak machine that postmaster is running on(Athlon 2400+ xp with 512 
RAM), but that will change in the future so I am trying to get performance 
increases that don't involve changing the machine at the moment.


I am currently inserting into the database through lipqxx's C++ interface. I am 
using prepared statements that perform regular inserts. I would like to use 
COPY FROM since I have read so much about its increased performance with 
respect to INSERT, but I am not sure how to use it in my case. So let me give 
you an idea on how the tables are laid out. 

The real DB has more tables, but lets say for the sake of argument I have 3 
tables; TB1, TB2, TB3. Lets say that TB1 has a primary key PK1 and a unique 
identifier column(type text) UK1 that has an index on it. TB2 then has a PK2, a 
UK2(type text) of its own with an index, and a foreign key FK2 that points to 
TB1's PK1. TB3 has a PK3 and a FK3 that points to FK2. 
TB1             TB2                               TB3
--------------     -------------------------------     ----------------------
PK1, UK1     PK2, UK2, FK2(PK1)     PK3, FK3(PK2)

Now in lipqxx I am parsing an input list of objects that are then written to 
these tables. Each object may produce one row in TB1, one row in TB2, and one 
row in TB3. The UK1 and UK2 indentifiers are used to prevent duplicate entries 
for TB1 and TB2 respectively. I know COPY FROM obeys these unique checks; 
however, my problem is the FKs. So lets say I try to insert a row into TB1. If 
it is unique on UK1 then it inserts a new row with some new primary key int4 
identifier and if it is a duplicate then no insert is done but the already 
existing row's primary key identifier is returned. This identifier(duplicate or 
not) is used when populating TB2's row as the FK2 identifier. The row that is 
to be inserted into TB2 needs the primary key indentifier from the result of 
the attempted insert into TB1. Similarily the insert into TB3 needs the result 
of the pk indentifier of the attempted insert into TB2. Once that is done then 
I move on to parsing the next object for insertion into the 3 tables.

So lets say I want to insert a list of objects using COPY FROM... whats the way 
to do it? How can I at the very least get a list of the primary keys of 
TB1(newly inserted rows or from already existings row) returned from the COPY 
FROM insert into TB1 so I can use them for the COPY FROM insert into TB2 and so 
on? Is there a better way to do this?

P.S. I am going to setup autovacuum for these bulk loads. My question though is 
why for bulkloads is VACUUM useful? I understand that it frees up dead rows as 
a result of UPDATE and such, but where are the dead rows created from plain 
INSERTS?

Thanks,
Morgan

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[PERFORM] Big question on insert performance/using COPY FROM

Reply via email to