Re: [PERFORM] Very slow queries - please help
Thanks very much - there are a lot of good articles there... Reading as fast as I can :) Best, Bealach From: Thomas F. O'Connell [EMAIL PROTECTED] To: Bealach-na Bo [EMAIL PROTECTED] CC: PgSQL - Performance pgsql-performance@postgresql.org Subject: Re: [PERFORM] Very slow queries - please help Date: Sun, 4 Dec 2005 00:40:01 -0600 On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote: The consensus seems to be that I need more indexes and I also need to look into the NOT IN statement as a possible bottleneck. I've introduced the indexes which has led to a DRAMATIC change in response time. Now I have to experiment with INNER JOIN - OUTER JOIN variations, SET ENABLE_SEQSCAN=OFF. Forgive me for not mentioning each person individually and by name. You have all contributed to confirming what I had suspected (and hoped): that *I* have a lot to learn! I'm attaching table descriptions, the first few lines of top output while the queries were running, index lists, sample queries and EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the indexes. As I said, DRAMATIC :) I notice that the CPU usage does not vary very much, it's nearly 100% anyway, but the memory usage drops markedly, which is another very nice result of the index introduction. Any more comments and tips would be very welcome. You might find the following resources from techdocs instructive: http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep2.php http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep3.php These documents provide some guidance into the process of index selection. It seems like you could still stand to benefit from more indexes based on your queries, table definitions, and current indexes. -- Thomas F. O'Connell Database Architecture and Programming Co-Founder Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Very slow queries - please help
On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote: The consensus seems to be that I need more indexes and I also need to look into the NOT IN statement as a possible bottleneck. I've introduced the indexes which has led to a DRAMATIC change in response time. Now I have to experiment with INNER JOIN - OUTER JOIN variations, SET ENABLE_SEQSCAN=OFF. Forgive me for not mentioning each person individually and by name. You have all contributed to confirming what I had suspected (and hoped): that *I* have a lot to learn! I'm attaching table descriptions, the first few lines of top output while the queries were running, index lists, sample queries and EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the indexes. As I said, DRAMATIC :) I notice that the CPU usage does not vary very much, it's nearly 100% anyway, but the memory usage drops markedly, which is another very nice result of the index introduction. Any more comments and tips would be very welcome. You might find the following resources from techdocs instructive: http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep2.php http://techdocs.postgresql.org/redir.php?link=/techdocs/ pgsqladventuresep3.php These documents provide some guidance into the process of index selection. It seems like you could still stand to benefit from more indexes based on your queries, table definitions, and current indexes. -- Thomas F. O'Connell Database Architecture and Programming Co-Founder Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 (cell) 615-469-5150 (office) 615-469-5151 (fax) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Very slow queries - please help.
Hi Folks, I'm new to Postgresql. I'm having great difficulties getting the performance I had hoped for from Postgresql 8.0. The typical query below takes ~20 minutes !! I hope an expert out there will tell me what I'm doing wrong - I hope *I* am doing something wrong. Hardware Single processor, Intel Xeon 3.06 GHz machine running Red Hat Ent. 4. with 1.5 GB of RAM. The machine is dedicated to running Postgresql 8.0 and Apache/mod_perl etc. The database is being accessed for report generation via a web form. The web server talks to Pg over TCP/IP (I know, that I don't need to do this if they are all on the same machine, but I have good reasons for this and don't suspect that this is where my problems are - I have the same poor performance when running from psql on the server.) Database Very simple, not fully normalized set of two tables. The first table, very small (2000 lines of 4 cols with very few chars and integers in in col). The other quite a bit larger (50 lines with 15 cols. with the largest fields ~ 256 chars) Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name The node table is the small table and the job_log table is the large table. I've tried all the basic things that I found in the documentation like VACUUM ANALYZE, EXPLAIN etc., but I suspect there is something terribly wrong with what I'm doing and these measures will not shave off 19 min and 50 seconds off the query time. Any help and comments would be very much appreciated. Bealach ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very slow queries - please help.
Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name Do you have any indexes? regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very slow queries - please help.
Hi, Thanks for your comments. I've explicitly made any indexes, but the default ones are: [EMAIL PROTECTED] \di List of relations Schema | Name | Type | Owner | Table -+-+---+-+- user | job_log_id_pkey | index | user | job_log user | node_id_pkey| index | user | node user | node_name_key | index | user | node (3 rows) I'm also sending the EXPLAIN outputs. explain SELECT n.name,n.type, n.usage, j.status, j.job_start,j.job_stop, j.nfiles_in_job,j.job_name FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL) ORDER BY n.name; QUERY PLAN -- Nested Loop (cost=0.00..75753.31 rows=1 width=461) Join Filter: (inner.node_id = outer.node_id) - Index Scan using node_name_key on node n (cost=0.00..307.75 rows=1 width=181) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=288) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL))) (6 rows) explain SELECT n.name, n.type, n.usage FROM node n WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND n.node_id NOT IN (SELECT n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL)) ORDER BY n.name; QUERY PLAN -- Index Scan using node_name_key on node n (cost=75451.55..75764.94 rows=1 width=177) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar) AND (NOT (hashed subplan))) SubPlan - Nested Loop (cost=0.00..75451.54 rows=1 width=4) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=4) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL))) - Index Scan using node_id_pkey on node n (cost=0.00..5.99 rows=1 width=4) Index Cond: (outer.node_id = n.node_id) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) Yours, Bealach From: Claus Guttesen [EMAIL PROTECTED] To: Bealach-na Bo [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Very slow queries - please help. Date: Thu, 24 Nov 2005 14:23:38 +0100 Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name Do you have any indexes? regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very slow queries - please help.
Hi, I'm also sending the EXPLAIN outputs. Please provide EXPLAIN ANALYZE outputs instead of EXPLAIN. You will have more information. Indexes on your tables are obviously missing. You should try to add: CREATE INDEX idx_node_filter ON node(name, type, usage); CREATE INDEX idx_job_log_filter ON job_log(job_name, job_start, job_stop); I'm not so sure it's a good idea to add job_stop in this index as you have an IS NULL in your query so I'm not sure it can be used. You should try it anyway and remove it if not needed. I added all your search fields in the indexes but it depends a lot on the selectivity of your conditions. I don't know your data but I think you understand the idea. HTH -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Very slow queries - please help.
Bealach-na Bo [EMAIL PROTECTED] writes: I'm having great difficulties getting the performance I had hoped for from Postgresql 8.0. The typical query below takes ~20 minutes !! You need to show us the table definition (including indexes) and the EXPLAIN ANALYZE results for the query. It seems likely that the NOT IN is the source of your problems, but it's hard to be sure without EXPLAIN results. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Very slow queries - please help
OK. The consensus seems to be that I need more indexes and I also need to look into the NOT IN statement as a possible bottleneck. I've introduced the indexes which has led to a DRAMATIC change in response time. Now I have to experiment with INNER JOIN - OUTER JOIN variations, SET ENABLE_SEQSCAN=OFF. Forgive me for not mentioning each person individually and by name. You have all contributed to confirming what I had suspected (and hoped): that *I* have a lot to learn! I'm attaching table descriptions, the first few lines of top output while the queries were running, index lists, sample queries and EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the indexes. As I said, DRAMATIC :) I notice that the CPU usage does not vary very much, it's nearly 100% anyway, but the memory usage drops markedly, which is another very nice result of the index introduction. Any more comments and tips would be very welcome. Thank you all for your input. Bealach. [EMAIL PROTECTED] \d job_log Table blouser.job_log Column |Type |Modifiers +-+-- job_log_id | integer | not null default nextval('job_log_id_seq'::text) first_registry | timestamp without time zone | blogger_name | character(50) | node_id| integer | job_type | character(50) | job_name | character(256) | job_start | timestamp without time zone | job_timeout| interval| job_stop | timestamp without time zone | nfiles_in_job | integer | status | integer | error_code | smallint| Indexes: job_log_id_pkey PRIMARY KEY, btree (job_log_id) Check constraints: job_log_status_check CHECK (status = 0 OR status = 1 OR status = 8 OR status = 9) Foreign-key constraints: legal_node FOREIGN KEY (node_id) REFERENCES node(node_id) [EMAIL PROTECTED] \d node Table blouser.node Column | Type | Modifiers -+---+--- node_id | integer | not null default nextval('node_id_seq'::text) name| character(50) | type| character(1) | usage | character(4) | Indexes: node_id_pkey PRIMARY KEY, btree (node_id) node_name_key UNIQUE, btree (name) Check constraints: node_type_check CHECK (type = 'B'::bpchar OR type = 'K'::bpchar OR type = 'C'::bpchar OR type = 'T'::bpchar OR type = 'R'::bpchar) node_usage_check CHECK (usage = 'TEST'::bpchar OR usage = 'LIVE'::bpchar) #before new indexes were created Tasks: 114 total, 2 running, 112 sleeping, 0 stopped, 0 zombie Cpu(s): 25.7% us, 24.5% sy, 0.0% ni, 49.4% id, 0.3% wa, 0.0% hi, 0.0% si Mem: 1554788k total, 1513576k used,41212k free,31968k buffers Swap: 1020024k total,27916k used, 992108k free, 708728k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 25883 postgres 25 0 20528 12m 11m R 99.7 0.8 4:54.91 postmaster [EMAIL PROTECTED] \di List of relations Schema | Name | Type | Owner | Table -+-+---+-+- blouser | job_log_id_pkey | index | blouser | job_log blouser | node_id_pkey| index | blouser | node blouser | node_name_key | index | blouser | node (3 rows) EXPLAIN ANALYSE SELECT n.name,n.type, n.usage, j.status, j.job_start,j.job_stop, j.nfiles_in_job,j.job_name FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '711%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL) ORDER BY n.name; QUERY PLAN --- Nested Loop (cost=0.00..75753.31 rows=1 width=461) (actual time=270486.692..291662.350 rows=3 loops=1) Join Filter: (inner.node_id = outer.node_id) - Index Scan using node_name_key on node n (cost=0.00..307.75 rows=1 width=181) (actual time=0.135..11.034 rows=208 loops=1) Filter: ((name ~~ '711%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=288) (actual time=273.374..1402.089 rows=22 loops=208) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14
[PERFORM] Very slow queries - please help
A quick note to say that I'm very grateful for Tom Lane's input also. Tom, I did put you on the list of recipients for my last posting to pgsql-performance, but got: cut here This is an automatically generated Delivery Status Notification. Delivery to the following recipients failed. [EMAIL PROTECTED] Many regards, Bealach ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly