One other thing I forgot to mention with solution 2. If you are going to be adding a fair number of records to the table on an ongoing basis you will want to drop indexes first and re-create them after the load is complete. I have tried it both ways and dropping is faster overall.
--sean
John Pagakis wrote:
Sean - I believe auto-commit was off (not at the box right now). I'll play with the commit interval; I know commits are expensive operations.
Thanks for item 2. I was toying with the notion of pre-creating 100000 bazes off-loading them and then seeing if the COPY would be any faster; you saved me the effort of experimenting. Thanks for the benefit of your experience.
__________________________________________________________________ John Pagakis Email: [EMAIL PROTECTED]
"Oh, you hate your job? Why didn't you say so? There's a support group for that. It's called EVERYBODY, and they meet at the bar." -- Drew Carey
This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com
-----Original Message----- From: Sean Shanny [mailto:[EMAIL PROTECTED] Sent: Friday, October 24, 2003 11:31 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Performance Concern
John,
Are you treating each insertion as a separate transaction? If so the performance will suffer. I am doing the same thing in building a data warehouse using PG. I have to load millions of records each night. I do two different things:
1) If I need to keep the insertions inside the java process I turn off auto-commit and every n insertions (5000 seems to give me the best performance for my setup) issue a commit. Make sure you do a final commit in a finally block so you don't miss anything.
2) Dump all the data to a file and then use a psql COPY <table> (columns) FROM 'file path' call to load it. Very fast.
--sean
John Pagakis wrote:
Greetings.alpha
I have a table that will require 100,000 rows initially.
Assume the following (some of the field names have been changed for confidentiality reasons):
CREATE TABLE baz ( baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_number) );
ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
Using JDBC, it took approximately one hour to insert 100,000 records. I
have an algorithm to generate a unique baz_number - it is a mixture of
and numerics.
There is a purchase table; one purchase can have many associated baz records, but the baz records will be pre-allocated - baz.customer_id allows null. The act of purchasing a baz will cause baz.customer_id to be populated from the customer_id (key) field in the purchase table.
If it took an hour to insert 100,000 records, I can only imagine how much time it will take if one customer were to attempt to purchase all 100,000 baz. Certainly too long for a web page.
I've not had to deal with this kind of volume in Postgres before; I have my suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) but I'd *LOVE* any thoughts.
Would I be better off making the key an identity field and not indexing on baz_number?
Thanks in advance for any help.
__________________________________________________________________ John Pagakis Email: [EMAIL PROTECTED]
"The best way to make your dreams come true is to wake up." -- Paul Valery
This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]