Re: [PERFORM] Performance Concern
On Sun, 26 Oct 2003 00:13:36 +0300, Hannu Krosing [EMAIL PROTECTED] wrote: UPDATE baz SET customer_id = '1234' WHERE baz_key IN ( SELECT baz_key FROM baz innerbaz WHERE customer_id IS NULL and innerbaz.baz_key = baz.baz_key LIMIT 1000 ); AFAICS this is not what the OP intended. It is equivalent to UPDATE baz SET customer_id = '1234' WHERE customer_id IS NULL; because the subselect is now correlated to the outer query and is evaluated for each row of the outer query which makes the LIMIT clause ineffective. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Concern
At 05:56 10/25/2003, John Pagakis wrote: Snipping most of this, I only have one suggestion/comment to make. [snip] CREATE TABLE baz ( baz_key int4 NOT NULL, baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_key) ); ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); [snip] I needed to do this because I absolutely positively cannot over-allocate baz. I cannot allocate more than 100,000 period, and any number of users can attempt to purchase one or more baz simultaneously. I am attempting to avoid a race condition and avoid using database locks as I feared this table would turn into a bottleneck. [snip] I have a similar situation in the database here, using the following example schema: CREATE TABLE foo ( nID serial UNIQUE NOT NULL, bAvailable boolean NOT NULL DEFAULT true, nSomeField int4 NOT NULL, sSomeField text NOT NULL ); CREATE TABLE bar ( nfoo_id int4 UNIQUE NOT NULL ); Assume foo is the table with the 100k pre-populated records that you want to assign to visitors on your site. bar is a table whos only purpose is to eliminate race conditions, working off the following business rules: 1. -- someone attempts to get a 'foo' SELECT nID from foo WHERE bAvailable; 2. -- we first try to assign this 'foo' to ourselves -- the ? is bound to the foo.nID we selected in step 1. INSERT INTO bar (nfoo_ID) VALUES (?) 3. -- Only if step 2 is successful, do we continue, otherwise someone beat us to it. UPDATE foo SET ... WHERE nID=? The key here is step 2. Since there is a UNIQUE constraint defined on the bar.nfoo_id (could even be an FK), only one INSERT will ever succeed. All others will fail. In step 3, you can set the bAvailable flag to false, along with whatever other values you need to set for your 'baz'. This will get much easier once 7.4 is production-ready, as the WHERE IN .. or WHERE NOT IN.. subselects are (according to the HISTORY file) going to be as efficient as joins, instead of the O(n) operation they apparently are right now. Until then however, I've found this simple trick works remarkably well. -Allen ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Performance Concern
Greetings. I have a table that will require 100,000 rows initially. Assume the following (some of the field names have been changed for confidentiality reasons): CREATE TABLE baz ( baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_number) ); ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); Using JDBC, it took approximately one hour to insert 100,000 records. I have an algorithm to generate a unique baz_number - it is a mixture of alpha and numerics. There is a purchase table; one purchase can have many associated baz records, but the baz records will be pre-allocated - baz.customer_id allows null. The act of purchasing a baz will cause baz.customer_id to be populated from the customer_id (key) field in the purchase table. If it took an hour to insert 100,000 records, I can only imagine how much time it will take if one customer were to attempt to purchase all 100,000 baz. Certainly too long for a web page. I've not had to deal with this kind of volume in Postgres before; I have my suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) but I'd *LOVE* any thoughts. Would I be better off making the key an identity field and not indexing on baz_number? Thanks in advance for any help. __ John Pagakis Email: [EMAIL PROTECTED] The best way to make your dreams come true is to wake up. -- Paul Valery This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Concern
John, Are you treating each insertion as a separate transaction? If so the performance will suffer. I am doing the same thing in building a data warehouse using PG. I have to load millions of records each night. I do two different things: 1) If I need to keep the insertions inside the java process I turn off auto-commit and every n insertions (5000 seems to give me the best performance for my setup) issue a commit. Make sure you do a final commit in a finally block so you don't miss anything. 2) Dump all the data to a file and then use a psql COPY table (columns) FROM 'file path' call to load it. Very fast. --sean John Pagakis wrote: Greetings. I have a table that will require 100,000 rows initially. Assume the following (some of the field names have been changed for confidentiality reasons): CREATE TABLE baz ( baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_number) ); ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); Using JDBC, it took approximately one hour to insert 100,000 records. I have an algorithm to generate a unique baz_number - it is a mixture of alpha and numerics. There is a purchase table; one purchase can have many associated baz records, but the baz records will be pre-allocated - baz.customer_id allows null. The act of purchasing a baz will cause baz.customer_id to be populated from the customer_id (key) field in the purchase table. If it took an hour to insert 100,000 records, I can only imagine how much time it will take if one customer were to attempt to purchase all 100,000 baz. Certainly too long for a web page. I've not had to deal with this kind of volume in Postgres before; I have my suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) but I'd *LOVE* any thoughts. Would I be better off making the key an identity field and not indexing on baz_number? Thanks in advance for any help. __ John Pagakis Email: [EMAIL PROTECTED] The best way to make your dreams come true is to wake up. -- Paul Valery This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Concern
Sean - I believe auto-commit was off (not at the box right now). I'll play with the commit interval; I know commits are expensive operations. Thanks for item 2. I was toying with the notion of pre-creating 10 bazes off-loading them and then seeing if the COPY would be any faster; you saved me the effort of experimenting. Thanks for the benefit of your experience. __ John Pagakis Email: [EMAIL PROTECTED] Oh, you hate your job? Why didn't you say so? There's a support group for that. It's called EVERYBODY, and they meet at the bar. -- Drew Carey This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Sean Shanny Sent: Friday, October 24, 2003 11:31 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Performance Concern John, Are you treating each insertion as a separate transaction? If so the performance will suffer. I am doing the same thing in building a data warehouse using PG. I have to load millions of records each night. I do two different things: 1) If I need to keep the insertions inside the java process I turn off auto-commit and every n insertions (5000 seems to give me the best performance for my setup) issue a commit. Make sure you do a final commit in a finally block so you don't miss anything. 2) Dump all the data to a file and then use a psql COPY table (columns) FROM 'file path' call to load it. Very fast. --sean John Pagakis wrote: Greetings. I have a table that will require 100,000 rows initially. Assume the following (some of the field names have been changed for confidentiality reasons): CREATE TABLE baz ( baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_number) ); ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); Using JDBC, it took approximately one hour to insert 100,000 records. I have an algorithm to generate a unique baz_number - it is a mixture of alpha and numerics. There is a purchase table; one purchase can have many associated baz records, but the baz records will be pre-allocated - baz.customer_id allows null. The act of purchasing a baz will cause baz.customer_id to be populated from the customer_id (key) field in the purchase table. If it took an hour to insert 100,000 records, I can only imagine how much time it will take if one customer were to attempt to purchase all 100,000 baz. Certainly too long for a web page. I've not had to deal with this kind of volume in Postgres before; I have my suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) but I'd *LOVE* any thoughts. Would I be better off making the key an identity field and not indexing on baz_number? Thanks in advance for any help. __ John Pagakis Email: [EMAIL PROTECTED] The best way to make your dreams come true is to wake up. -- Paul Valery This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Concern
John, One other thing I forgot to mention with solution 2. If you are going to be adding a fair number of records to the table on an ongoing basis you will want to drop indexes first and re-create them after the load is complete. I have tried it both ways and dropping is faster overall. --sean John Pagakis wrote: Sean - I believe auto-commit was off (not at the box right now). I'll play with the commit interval; I know commits are expensive operations. Thanks for item 2. I was toying with the notion of pre-creating 10 bazes off-loading them and then seeing if the COPY would be any faster; you saved me the effort of experimenting. Thanks for the benefit of your experience. __ John Pagakis Email: [EMAIL PROTECTED] Oh, you hate your job? Why didn't you say so? There's a support group for that. It's called EVERYBODY, and they meet at the bar. -- Drew Carey This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com -Original Message- From: Sean Shanny [mailto:[EMAIL PROTECTED] Sent: Friday, October 24, 2003 11:31 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Performance Concern John, Are you treating each insertion as a separate transaction? If so the performance will suffer. I am doing the same thing in building a data warehouse using PG. I have to load millions of records each night. I do two different things: 1) If I need to keep the insertions inside the java process I turn off auto-commit and every n insertions (5000 seems to give me the best performance for my setup) issue a commit. Make sure you do a final commit in a finally block so you don't miss anything. 2) Dump all the data to a file and then use a psql COPY table (columns) FROM 'file path' call to load it. Very fast. --sean John Pagakis wrote: Greetings. I have a table that will require 100,000 rows initially. Assume the following (some of the field names have been changed for confidentiality reasons): CREATE TABLE baz ( baz_number CHAR(15) NOT NULL, customer_id CHAR(39), foobar_id INTEGER, is_cancelled BOOL DEFAULT false NOT NULL, create_user VARCHAR(60) NOT NULL, create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, last_update_user VARCHAR(60) NOT NULL, last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, CONSTRAINT PK_baz PRIMARY KEY (baz_number) ); ALTER TABLE baz ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); ALTER TABLE baz ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); Using JDBC, it took approximately one hour to insert 100,000 records. I have an algorithm to generate a unique baz_number - it is a mixture of alpha and numerics. There is a purchase table; one purchase can have many associated baz records, but the baz records will be pre-allocated - baz.customer_id allows null. The act of purchasing a baz will cause baz.customer_id to be populated from the customer_id (key) field in the purchase table. If it took an hour to insert 100,000 records, I can only imagine how much time it will take if one customer were to attempt to purchase all 100,000 baz. Certainly too long for a web page. I've not had to deal with this kind of volume in Postgres before; I have my suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) but I'd *LOVE* any thoughts. Would I be better off making the key an identity field and not indexing on baz_number? Thanks in advance for any help. __ John Pagakis Email: [EMAIL PROTECTED] The best way to make your dreams come true is to wake up. -- Paul Valery This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]