Re: [PERFORM] Performance Concern

2003-10-27 Thread Manfred Koizar
On Sun, 26 Oct 2003 00:13:36 +0300, Hannu Krosing [EMAIL PROTECTED] wrote:
UPDATE baz
   SET customer_id = '1234'
 WHERE baz_key IN (
SELECT baz_key
  FROM baz innerbaz
 WHERE customer_id IS NULL
   and innerbaz.baz_key = baz.baz_key
 LIMIT 1000 );

AFAICS this is not what the OP intended.  It is equivalent to 

UPDATE baz
   SET customer_id = '1234'
 WHERE customer_id IS NULL;

because the subselect is now correlated to the outer query and is
evaluated for each row of the outer query which makes the LIMIT clause
ineffective.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Performance Concern

2003-10-25 Thread Allen Landsidel
At 05:56 10/25/2003, John Pagakis wrote:

Snipping most of this, I only have one suggestion/comment to make.

[snip]

CREATE TABLE baz (
baz_key int4 NOT NULL,
baz_number CHAR(15) NOT NULL,
customer_id CHAR(39),
foobar_id INTEGER,
is_cancelled BOOL DEFAULT false NOT NULL,
create_user VARCHAR(60) NOT NULL,
create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
last_update_user VARCHAR(60) NOT NULL,
last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
CONSTRAINT PK_baz PRIMARY KEY (baz_key)
);
ALTER TABLE baz
ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
ALTER TABLE baz
ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
[snip]

I needed to do this because I absolutely positively cannot over-allocate
baz.  I cannot allocate more than 100,000 period, and any number of users
can attempt to purchase one or more baz simultaneously.  I am attempting to
avoid a race condition and avoid using database locks as I feared this table
would turn into a bottleneck.
[snip]

I have a similar situation in the database here, using the following 
example schema:

CREATE TABLE foo
(
  nID serial UNIQUE NOT NULL,
  bAvailable boolean NOT NULL DEFAULT true,
  nSomeField int4 NOT NULL,
  sSomeField text NOT NULL
);
CREATE TABLE bar
(
  nfoo_id int4 UNIQUE NOT NULL
);
Assume foo is the table with the 100k pre-populated records that you want 
to assign to visitors on your site.  bar is a table whos only purpose is to 
eliminate race conditions, working off the following business rules:

1. -- someone attempts to get a 'foo'
  SELECT nID from foo WHERE bAvailable;
2. -- we first try to assign this 'foo' to ourselves
   -- the ? is bound to the foo.nID we selected in step 1.
  INSERT INTO bar (nfoo_ID) VALUES (?)
3. -- Only if step 2 is successful, do we continue, otherwise someone beat 
us to it.
  UPDATE foo SET ... WHERE nID=?

The key here is step 2.

Since there is a UNIQUE constraint defined on the bar.nfoo_id (could even 
be an FK), only one INSERT will ever succeed.  All others will fail.  In 
step 3, you can set the bAvailable flag to false, along with whatever other 
values you need to set for your 'baz'.

This will get much easier once 7.4 is production-ready, as the WHERE IN .. 
or WHERE NOT IN.. subselects are (according to the HISTORY file) going to 
be as efficient as joins, instead of the O(n) operation they apparently are 
right now.

Until then however, I've found this simple trick works remarkably well.

-Allen 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Performance Concern

2003-10-24 Thread John Pagakis
Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):

CREATE TABLE baz (
baz_number CHAR(15) NOT NULL,
customer_id CHAR(39),
foobar_id INTEGER,
is_cancelled BOOL DEFAULT false NOT NULL,
create_user VARCHAR(60) NOT NULL,
create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
last_update_user VARCHAR(60) NOT NULL,
last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);

ALTER TABLE baz
ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);

ALTER TABLE baz
ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);


Using JDBC, it took approximately one hour to insert 100,000 records.  I
have an algorithm to generate a unique baz_number - it is a mixture of alpha
and numerics.

There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.

If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.

I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.

Would I be better off making the key an identity field and not indexing on
baz_number?

Thanks in advance for any help.

__
John Pagakis
Email: [EMAIL PROTECTED]


The best way to make your dreams come true is to wake up.
-- Paul Valery

This signature generated by
 ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
  www.spazmodicfrog.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Concern

2003-10-24 Thread Sean Shanny
John,

Are you treating each insertion as a separate transaction?  If so the 
performance will suffer.  I am doing the same thing in building a data 
warehouse using PG.  I have to load millions of records each night.  I 
do two different things:

1) If I need to keep the insertions inside the java process I turn off 
auto-commit and every n insertions (5000 seems to give me the best 
performance for my setup) issue a commit.  Make sure you do a final 
commit in a finally block so you don't miss anything.

2) Dump all the data to a file and then use a psql COPY table 
(columns) FROM 'file path' call to load it.  Very fast.

--sean

John Pagakis wrote:

Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):
CREATE TABLE baz (
   baz_number CHAR(15) NOT NULL,
   customer_id CHAR(39),
   foobar_id INTEGER,
   is_cancelled BOOL DEFAULT false NOT NULL,
   create_user VARCHAR(60) NOT NULL,
   create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
   last_update_user VARCHAR(60) NOT NULL,
   last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
   CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);
ALTER TABLE baz
   ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
ALTER TABLE baz
   ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
Using JDBC, it took approximately one hour to insert 100,000 records.  I
have an algorithm to generate a unique baz_number - it is a mixture of alpha
and numerics.
There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.
If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.
I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.
Would I be better off making the key an identity field and not indexing on
baz_number?
Thanks in advance for any help.

__
John Pagakis
Email: [EMAIL PROTECTED]
The best way to make your dreams come true is to wake up.
   -- Paul Valery
This signature generated by
... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
 www.spazmodicfrog.com
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Performance Concern

2003-10-24 Thread John Pagakis
Sean -
I believe auto-commit was off (not at the box right now).  I'll play with
the commit interval; I know commits are expensive operations.

Thanks for item 2.  I was toying with the notion of pre-creating 10
bazes off-loading them and then seeing if the COPY would be any faster; you
saved me the effort of experimenting.  Thanks for the benefit of your
experience.

__
John Pagakis
Email: [EMAIL PROTECTED]


Oh, you hate your job?  Why didn't you say so?
 There's a support group for that. It's called
 EVERYBODY, and they meet at the bar.
-- Drew Carey

This signature generated by
 ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
  www.spazmodicfrog.com


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Sean Shanny
Sent: Friday, October 24, 2003 11:31 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Performance Concern


John,

Are you treating each insertion as a separate transaction?  If so the
performance will suffer.  I am doing the same thing in building a data
warehouse using PG.  I have to load millions of records each night.  I
do two different things:

1) If I need to keep the insertions inside the java process I turn off
auto-commit and every n insertions (5000 seems to give me the best
performance for my setup) issue a commit.  Make sure you do a final
commit in a finally block so you don't miss anything.

2) Dump all the data to a file and then use a psql COPY table
(columns) FROM 'file path' call to load it.  Very fast.

--sean

John Pagakis wrote:

Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):

CREATE TABLE baz (
baz_number CHAR(15) NOT NULL,
customer_id CHAR(39),
foobar_id INTEGER,
is_cancelled BOOL DEFAULT false NOT NULL,
create_user VARCHAR(60) NOT NULL,
create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
last_update_user VARCHAR(60) NOT NULL,
last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);

ALTER TABLE baz
ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);

ALTER TABLE baz
ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);


Using JDBC, it took approximately one hour to insert 100,000 records.  I
have an algorithm to generate a unique baz_number - it is a mixture of
alpha
and numerics.

There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.

If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.

I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.

Would I be better off making the key an identity field and not indexing on
baz_number?

Thanks in advance for any help.

__
John Pagakis
Email: [EMAIL PROTECTED]


The best way to make your dreams come true is to wake up.
-- Paul Valery

This signature generated by
 ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
  www.spazmodicfrog.com


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Performance Concern

2003-10-24 Thread Sean Shanny
John,

One other thing I forgot to mention with solution 2.  If you are going 
to be adding a fair number of records to the table on an ongoing basis 
you will want to drop indexes first and re-create them after the load is 
complete.  I have tried it both ways and dropping is faster overall. 

--sean

John Pagakis wrote:

Sean -
I believe auto-commit was off (not at the box right now).  I'll play with
the commit interval; I know commits are expensive operations.
Thanks for item 2.  I was toying with the notion of pre-creating 10
bazes off-loading them and then seeing if the COPY would be any faster; you
saved me the effort of experimenting.  Thanks for the benefit of your
experience.
__
John Pagakis
Email: [EMAIL PROTECTED]
Oh, you hate your job?  Why didn't you say so?
There's a support group for that. It's called
EVERYBODY, and they meet at the bar.
   -- Drew Carey
This signature generated by
... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
 www.spazmodicfrog.com
-Original Message-
From: Sean Shanny [mailto:[EMAIL PROTECTED]
Sent: Friday, October 24, 2003 11:31 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Performance Concern
John,

Are you treating each insertion as a separate transaction?  If so the
performance will suffer.  I am doing the same thing in building a data
warehouse using PG.  I have to load millions of records each night.  I
do two different things:
1) If I need to keep the insertions inside the java process I turn off
auto-commit and every n insertions (5000 seems to give me the best
performance for my setup) issue a commit.  Make sure you do a final
commit in a finally block so you don't miss anything.
2) Dump all the data to a file and then use a psql COPY table
(columns) FROM 'file path' call to load it.  Very fast.
--sean

John Pagakis wrote:

 

Greetings.

I have a table that will require 100,000 rows initially.

Assume the following (some of the field names have been changed for
confidentiality reasons):
CREATE TABLE baz (
  baz_number CHAR(15) NOT NULL,
  customer_id CHAR(39),
  foobar_id INTEGER,
  is_cancelled BOOL DEFAULT false NOT NULL,
  create_user VARCHAR(60) NOT NULL,
  create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
  last_update_user VARCHAR(60) NOT NULL,
  last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
  CONSTRAINT PK_baz PRIMARY KEY (baz_number)
);
ALTER TABLE baz
  ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
ALTER TABLE baz
  ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
Using JDBC, it took approximately one hour to insert 100,000 records.  I
have an algorithm to generate a unique baz_number - it is a mixture of
   

alpha
 

and numerics.

There is a purchase table; one purchase can have many associated baz
records, but the baz records will be pre-allocated - baz.customer_id allows
null.  The act of purchasing a baz will cause baz.customer_id to be
populated from the customer_id (key) field in the purchase table.
If it took an hour to insert 100,000 records, I can only imagine how much
time it will take if one customer were to attempt to purchase all 100,000
baz.  Certainly too long for a web page.
I've not had to deal with this kind of volume in Postgres before; I have my
suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?)
but I'd *LOVE* any thoughts.
Would I be better off making the key an identity field and not indexing on
baz_number?
Thanks in advance for any help.

__
John Pagakis
Email: [EMAIL PROTECTED]
The best way to make your dreams come true is to wake up.
  -- Paul Valery
This signature generated by
   ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
www.spazmodicfrog.com
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


   



 



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]