Bear with me all - working my way through this.

First of all, thanks for all the terrific advice.  I think I focused you on
the inserts, when my *REAL* concern is the updates.  Initially, I was
surmising that if the insert of 100,000 baz took an hour, an update to
customer_id of, say 1000 baz, would simply be outrageous.  I now have a
better feel for how bad it is.

I have already learned that making an integer the key of baz as opposed to
baz_number - a CHAR( 15 ) - cuts my update cost almost in half, so my
reiteration of the example uses this schema change.

Please let me start again and perhaps do a little better job of explaining:

Assume the following (some of the field names have been changed for
confidentiality reasons):

CREATE TABLE baz (
    baz_key int4 NOT NULL,
    baz_number CHAR(15) NOT NULL,
    customer_id CHAR(39),
    foobar_id INTEGER,
    is_cancelled BOOL DEFAULT false NOT NULL,
    create_user VARCHAR(60) NOT NULL,
    create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
    last_update_user VARCHAR(60) NOT NULL,
    last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
    CONSTRAINT PK_baz PRIMARY KEY (baz_key)
);

ALTER TABLE baz
    ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);

ALTER TABLE baz
    ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);


There is a purchase table; one purchase can have many associated baz
records, but the 100,00 baz records will be pre-allocated - baz.customer_id
allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.  The
column customer_id is actually the key to the purchase table despite the
name.

The foobar table is inconsequential as it will not be populated until the
baz table is sold out.  So for the inserts and updates, foobar will be
empty.  I could probably not even gen it until I needed it.


As I said earlier I'm less concerned about the inserts than I am about the
updates.  The reason is the 100,000 inserts will happen before the site is
live.  The updates will happen as part of the purchase process, so updates
need to be as fast as possible.

I needed to do this because I absolutely positively cannot over-allocate
baz.  I cannot allocate more than 100,000 period, and any number of users
can attempt to purchase one or more baz simultaneously.  I am attempting to
avoid a race condition and avoid using database locks as I feared this table
would turn into a bottleneck.

Note, as this question came up more than once from my previous post: Auto
Commit was off for the inserts.


This will be for a public website and multiple users will be "competing" for
baz resources.  My thought was for each user wishing to purchase one or more
bazes:

- Create a list of potentially available baz: SELECT baz_key WHERE
customer_id IS NULL LIMIT 100;
  - If there are no more records in baz with customer_id of NULL, it's a
sell-out.
- Iterate through the list attempting to reserve a BAZ.  Iterate until you
have reserved the number of baz requested or until the list is exhausted:
UPDATE baz SET customer_id = <someCustId> WHERE baz_key = <currentKeyInList>
AND customer_id IS NULL;
  - For a given update, if no record was updated, someone else set the
customer_id before you could - go to the next baz_key in the list and try
again.
  - If the list is exhausted go get the next block of 100 potential
available baz keys and go again.


Anyway, given this scenario, I *HAVE* to have auto commit on for updates so
that everyone is aware of everyone else immediately.


I wrote a JAVA simulation of the above that did 1000 updates in 37 seconds.
That left me scratching my head because in psql when I did the
semi-equivalent:

UPDATE baz SET customer_id = '1234' WHERE baz_key IN( SELECT baz_key FROM
baz WHERE customer_id IS NULL LIMIT 1000 );

it took 1:27 (one minute 27 seconds) to execute.  This led me (erroneously)
to the conclusion that Postgres was somehow happier doing single updates
than "grouping" them.  I realized today that I missed something in my
simulation (pulling an all-nighter will do that to you): my JAVA simulation
had Auto Commit off and I was doing a commit at the end.  Obviously that
won't work given what I'm trying to do.  Any updates must *IMMEDIATLY* be
visible to all other processes, or I could get hit with a race condition.  I
re-ran with Auto Commit on and the timing fell more in line with what I saw
in psql - 1:13.

This seems a slow to me.

Is there any way to optimize the update?  Or, perhaps my design is the issue
and I just need to do something else.  Perhaps a lock on the table and an
insert would be quicker.  I'm just worried about locking in a multi-user
environment.  On the other hand, it looks to me like this table will be a
bottleneck no matter what I do.

Your thoughts, as always, are much appreciated.


__________________________________________________________________
John Pagakis
Email: [EMAIL PROTECTED]


"If I had a formula for bypassing trouble, I would not pass it round.
 Trouble creates a capacity to handle it.  I don't embrace trouble; that's
 as bad as treating it as an enemy.  But I do say meet it as a friend, for
 you'll see a lot of it and had better be on speaking terms with it."
        -- Oliver Wendell Holmes

This signature generated by
     ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
                                              www.spazmodicfrog.com


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of John Pagakis
Sent: Thursday, October 23, 2003 5:21 AM
To: [EMAIL PROTECTED]
Subject: [PERFORM] Performance Concern


Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):

CREATE TABLE baz (
    baz_number CHAR(15) NOT NULL,
    customer_id CHAR(39),
    foobar_id INTEGER,
    is_cancelled BOOL DEFAULT false NOT NULL,
    create_user VARCHAR(60) NOT NULL,
    create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
    last_update_user VARCHAR(60) NOT NULL,
    last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
    CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);

ALTER TABLE baz
    ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);

ALTER TABLE baz
    ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);


Using JDBC, it took approximately one hour to insert 100,000 records.  I
have an algorithm to generate a unique baz_number - it is a mixture of alpha
and numerics.

There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.

If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.

I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.

Would I be better off making the key an identity field and not indexing on
baz_number?

Thanks in advance for any help.

__________________________________________________________________
John Pagakis
Email: [EMAIL PROTECTED]


"The best way to make your dreams come true is to wake up."
        -- Paul Valery

This signature generated by
     ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
                                              www.spazmodicfrog.com


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to