Re: [PERFORM] Performance Concern

Christopher Browne Fri, 24 Oct 2003 13:14:26 -0700

[EMAIL PROTECTED] ("John Pagakis") writes:
> Greetings.
>
> I have a table that will require 100,000 rows initially.
>
> Assume the following (some of the field names have been changed for
> confidentiality reasons):
>
> CREATE TABLE baz (
>     baz_number CHAR(15) NOT NULL,
>     customer_id CHAR(39),
>     foobar_id INTEGER,
>     is_cancelled BOOL DEFAULT false NOT NULL,
>     create_user VARCHAR(60) NOT NULL,
>     create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
>     last_update_user VARCHAR(60) NOT NULL,
>     last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
>     CONSTRAINT PK_baz PRIMARY KEY (baz_number)
> );
>
> ALTER TABLE baz
>     ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
>
> ALTER TABLE baz
>     ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
>
> Using JDBC, it took approximately one hour to insert 100,000 records.  I
> have an algorithm to generate a unique baz_number - it is a mixture of alpha
> and numerics.


Question #1:  How did you do the inserts?

If AUTO-COMMIT was turned on, then that would indicate that you
invoked 100,000 transactions, and that would contribute considerably
to the process being slow.  Put them all in as one transaction and
you'd probably see it run in a fraction of the time.

Question #2.  Do you have indices on purchase(customer_id) and on
foobar(foobar_id)?

If not, then the foreign key check would be rather inefficient.

> There is a purchase table; one purchase can have many associated baz
> records, but the baz records will be pre-allocated - baz.customer_id
> allows null.  The act of purchasing a baz will cause baz.customer_id
> to be populated from the customer_id (key) field in the purchase
> table.
>
> If it took an hour to insert 100,000 records, I can only imagine how
> much time it will take if one customer were to attempt to purchase
> all 100,000 baz.  Certainly too long for a web page.

I take it that each "baz" is a uniquely identifiable product, akin to
(say) an RSA certificate or the like?

By the way, if you set up a stored procedure in PostgreSQL that can
generate the "baz_number" identifiers, you could probably do the
inserts Right Well Fast...

Consider the following.  I have a stored procedure, genauth(), which
generates quasi-random values.  (They're passwords, sort of...)

cctld=# explain analyze insert into baz (baz_number, create_user, last_update_user)
cctld-#   select substr(genauth(), 1, 15), 'cbbrowne', 'cbbrowne' from big_table;
                                                  QUERY PLAN                           
                        
---------------------------------------------------------------------------------------------------------------
 Seq Scan on big_table  (cost=0.00..789.88 rows=28988 width=0) (actual 
time=0.20..1713.60 rows=28988 loops=1)
 Total runtime: 3197.40 msec
(2 rows)

It took about 3 seconds to insert 28988 rows into baz.  (big_table,
also renamed, to protect the innocent, has 28988 rows. I didn't care
about its contents, just that it had a bunch of rows.)

And the above is on a cheap desktop PC with IDE disk.

> I've not had to deal with this kind of volume in Postgres before; I
> have my suspicions on what is wrong here (could it be using a CHAR(
> 15 ) as a key?)  but I'd *LOVE* any thoughts.

> Would I be better off making the key an identity field and not
> indexing on baz_number?

That might be something of an improvement, but it oughtn't be
cripplingly different to use a text field rather than an integer.

What's crippling is submitting 100,000 queries in 100,000
transactions.  Cut THAT down to size and you'll see performance return
to being reasonable.
-- 
"cbbrowne","@","libertyrms.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Performance Concern

Reply via email to