Re: [PERFORM] Performance Concern

Sean Shanny Fri, 24 Oct 2003 18:15:51 -0700

John,

One other thing I forgot to mention with solution 2. If you are going to be adding a fair number of records to the table on an ongoing basis you will want to drop indexes first and re-create them after the load is complete. I have tried it both ways and dropping is faster overall.

--sean

John Pagakis wrote:

Sean -
I believe auto-commit was off (not at the box right now).  I'll play with
the commit interval; I know commits are expensive operations.

Thanks for item 2.  I was toying with the notion of pre-creating 100000
bazes off-loading them and then seeing if the COPY would be any faster; you
saved me the effort of experimenting.  Thanks for the benefit of your
experience.

__________________________________________________________________
John Pagakis
Email: [EMAIL PROTECTED]


"Oh, you hate your job?  Why didn't you say so?
There's a support group for that. It's called
EVERYBODY, and they meet at the bar."
       -- Drew Carey

This signature generated by
    ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
                                             www.spazmodicfrog.com


-----Original Message-----
From: Sean Shanny [mailto:[EMAIL PROTECTED]
Sent: Friday, October 24, 2003 11:31 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Performance Concern

John,

Are you treating each insertion as a separate transaction?  If so the
performance will suffer.  I am doing the same thing in building a data
warehouse using PG.  I have to load millions of records each night.  I
do two different things:

1) If I need to keep the insertions inside the java process I turn off
auto-commit and every n insertions (5000 seems to give me the best
performance for my setup) issue a commit.  Make sure you do a final
commit in a finally block so you don't miss anything.

2) Dump all the data to a file and then use a psql COPY <table>
(columns) FROM 'file path' call to load it.  Very fast.

--sean

John Pagakis wrote:

Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):

CREATE TABLE baz (
  baz_number CHAR(15) NOT NULL,
  customer_id CHAR(39),
  foobar_id INTEGER,
  is_cancelled BOOL DEFAULT false NOT NULL,
  create_user VARCHAR(60) NOT NULL,
  create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
  last_update_user VARCHAR(60) NOT NULL,
  last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
  CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);

ALTER TABLE baz
  ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);

ALTER TABLE baz
  ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);

Using JDBC, it took approximately one hour to insert 100,000 records. I have an algorithm to generate a unique baz_number - it is a mixture of

alpha

and numerics.

There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.

If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.

I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.

Would I be better off making the key an identity field and not indexing on
baz_number?

Thanks in advance for any help.

__________________________________________________________________
John Pagakis
Email: [EMAIL PROTECTED]


"The best way to make your dreams come true is to wake up."
      -- Paul Valery

This signature generated by
   ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
                                            www.spazmodicfrog.com


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance Concern

Reply via email to