Re: [PERFORM] Partial index usage
On Feb 16, 2009, at 9:07 AM, Craig Ringer wrote: CREATE INDEX uidx_product_partno_producer_id ON product USING btree (partno, producer_id); CREATE INDEX idx_product_partno ON product USING btree (partno); Can I safely delete the second one? You can safely delete BOTH in that it won't hurt your data, only potentially hurt performance. Deleting the index on (partno) should somewhat improve insert performance and performance on updates that can't be done via HOT. However, the index on (partno, producer_id) is requires more storage and memory than the index on just (partno). AFAIK it's considerably slower to scan. Actually, that's not necessarily true. If both partno and procuder_id are ints and you're on a 64bit platform, there won't be any change in index size, due to alignment issues. -- Decibel!, aka Jim C. Nasby, Database Architect deci...@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partial index usage
Laszlo Nagy wrote: Hi All, I have these indexes on a table: CREATE INDEX uidx_product_partno_producer_id ON product USING btree (partno, producer_id); CREATE INDEX idx_product_partno ON product USING btree (partno); Can I safely delete the second one? You can safely delete BOTH in that it won't hurt your data, only potentially hurt performance. Deleting the index on (partno) should somewhat improve insert performance and performance on updates that can't be done via HOT. However, the index on (partno, producer_id) is requires more storage and memory than the index on just (partno). AFAIK it's considerably slower to scan. Within a transaction, drop the second index then run the query of interest with EXPLAIN ANALYZE to determine just how much slower - then ROLLBACK to undo the index drop. You'll lock out other transactions while you're doing this, but you won't make any permanent changes and you can cancel it at any time. Will postgresql use (partno,producer_id) when it only needs to order by partno? Yes. I guess if I only had one index, it would save memory and increase performance. Maybe. If they both fit into memory along with the main table data, then you might end up losing instead since the second index is smaller and should be somewhat faster to scan. am_upload_status_id is also an int4. Can I delete the second index without performance drawback? Same answer as above - test it and find out. You may win or lose depending on your workload, table sizes, available memory, etc. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Forcing index usage without 'enable_hashjoin = FALSE'
On 23 Aug 2006 at 13:31, Chris wrote: Dan Langille wrote: I'm using PostgreSQL 8.1.4 and I'm trying to force the planner to use an index. With the index, I get executions times of 0.5 seconds. Without, it's closer to 2.5 seconds. Compare these two sets of results (also provided at http://rafb.net/paste/results/ywcOZP66.html should it appear poorly formatted below): freshports.org=# \i test2.sql QUERY PLAN -- -- - Merge Join (cost=24030.39..24091.43 rows=3028 width=206) (actual time=301.301..355.261 rows=3149 loops=1) Merge Cond: (outer.id = inner.category_id) - Sort (cost=11.17..11.41 rows=97 width=4) (actual time=0.954..1.300 rows=95 loops=1) Sort Key: c.id - Seq Scan on categories c (cost=0.00..7.97 rows=97 width=4) (actual time=0.092..0.517 rows=97 loops=1) - Sort (cost=24019.22..24026.79 rows=3028 width=206) (actual time=300.317..314.114 rows=3149 loops=1) Sort Key: p.category_id - Nested Loop (cost=0.00..23844.14 rows=3028 width=206) (actual time=0.082..264.459 rows=3149 loops=1) - Seq Scan on ports p (cost=0.00..6141.11 rows=3028 width=206) (actual time=0.026..133.575 rows=3149 loops=1) Filter: (status = 'D'::bpchar) - Index Scan using element_pkey on element e (cost=0.00..5.83 rows=1 width=4) (actual time=0.022..0.026 rows=1 loops=3149) Index Cond: (outer.element_id = e.id) Total runtime: 369.869 ms (13 rows) freshports.org=# set enable_hashjoin = true; SET freshports.org=# \i test2.sql QUERY PLAN -- -- Hash Join (cost=6156.90..13541.14 rows=3028 width=206) (actual time=154.741..2334.366 rows=3149 loops=1) Hash Cond: (outer.category_id = inner.id) - Hash Join (cost=6148.68..13472.36 rows=3028 width=206) (actual time=153.801..2288.792 rows=3149 loops=1) Hash Cond: (outer.id = inner.element_id) - Seq Scan on element e (cost=0.00..4766.70 rows=252670 width=4) (actual time=0.022..1062.626 rows=252670 loops=1) - Hash (cost=6141.11..6141.11 rows=3028 width=206) (actual time=151.105..151.105 rows=3149 loops=1) - Seq Scan on ports p (cost=0.00..6141.11 rows=3028 width=206) (actual time=0.027..131.072 rows=3149 loops=1) Filter: (status = 'D'::bpchar) - Hash (cost=7.97..7.97 rows=97 width=4) (actual time=0.885..0.885 rows=97 loops=1) - Seq Scan on categories c (cost=0.00..7.97 rows=97 width=4) (actual time=0.076..0.476 rows=97 loops=1) Total runtime: 2346.877 ms (11 rows) freshports.org=# Without leaving enable_hashjoin = false, can you suggest a way to force the index usage? FYI, the query is: explain analyse SELECT P.id, P.category_id, P.version as version, P.revisionas revision, P.element_id, P.maintainer, P.short_description, to_char(P.date_added - SystemTimeAdjust(), 'DD Mon HH24:MI:SS') as date_added, P.last_commit_id as last_change_log_id, P.package_exists, P.extract_suffix, P.homepage, P.status, P.broken, P.forbidden, P.ignore, P.restricted, P.deprecated, P.no_cdrom, P.expiration_date, P.latest_link FROM categories C, ports P JOIN element E on P.element_id = E.id WHERE P.status = 'D' AND P.category_id = C.id; I doubt it would make a difference but if you: ... FROM categories C JOIN ports P on P.category_id=C.id JOIN element E on P.element_id = E.id WHERE P.status = 'D'; does it change anything? Not really, no: freshports.org=# \i test3.sql QUERY PLAN -- -- --- Hash Join (cost=5344.62..12740.73 rows=3365 width=204) (actual time=63.871..2164.880 rows=3149 loops=1) Hash Cond: (outer.category_id = inner.id) - Hash Join (cost=5336.41..12665.22 rows=3365 width=204) (actual time=62.918..2122.529 rows=3149 loops=1) Hash Cond: (outer.id = inner.element_id) - Seq Scan on element e (cost=0.00..4767.58 rows=252758 width=4) (actual time=0.019..1024.299 rows=252791 loops=1) - Hash (cost=5328.00..5328.00 rows=3365 width=204) (actual time=60.228..60.228 rows=3149
Re: [PERFORM] Forcing index usage without 'enable_hashjoin = FALSE'
Dan Langille [EMAIL PROTECTED] writes: Without leaving enable_hashjoin = false, can you suggest a way to force the index usage? Have you tried reducing random_page_cost? FYI, 8.2 should be a bit better about this. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Forcing index usage without 'enable_hashjoin = FALSE'
On 23 Aug 2006 at 22:30, Tom Lane wrote: Dan Langille [EMAIL PROTECTED] writes: Without leaving enable_hashjoin = false, can you suggest a way to force the index usage? Have you tried reducing random_page_cost? Yes. No effect. FYI, 8.2 should be a bit better about this. Good. This query is not critical, but it would be nice. Thank you. -- Dan Langille : Software Developer looking for work my resume: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Forcing index usage without 'enable_hashjoin = FALSE'
Dan Langille wrote: I'm using PostgreSQL 8.1.4 and I'm trying to force the planner to use an index. With the index, I get executions times of 0.5 seconds. Without, it's closer to 2.5 seconds. Compare these two sets of results (also provided at http://rafb.net/paste/results/ywcOZP66.html should it appear poorly formatted below): freshports.org=# \i test2.sql QUERY PLAN -- -- - Merge Join (cost=24030.39..24091.43 rows=3028 width=206) (actual time=301.301..355.261 rows=3149 loops=1) Merge Cond: (outer.id = inner.category_id) - Sort (cost=11.17..11.41 rows=97 width=4) (actual time=0.954..1.300 rows=95 loops=1) Sort Key: c.id - Seq Scan on categories c (cost=0.00..7.97 rows=97 width=4) (actual time=0.092..0.517 rows=97 loops=1) - Sort (cost=24019.22..24026.79 rows=3028 width=206) (actual time=300.317..314.114 rows=3149 loops=1) Sort Key: p.category_id - Nested Loop (cost=0.00..23844.14 rows=3028 width=206) (actual time=0.082..264.459 rows=3149 loops=1) - Seq Scan on ports p (cost=0.00..6141.11 rows=3028 width=206) (actual time=0.026..133.575 rows=3149 loops=1) Filter: (status = 'D'::bpchar) - Index Scan using element_pkey on element e (cost=0.00..5.83 rows=1 width=4) (actual time=0.022..0.026 rows=1 loops=3149) Index Cond: (outer.element_id = e.id) Total runtime: 369.869 ms (13 rows) freshports.org=# set enable_hashjoin = true; SET freshports.org=# \i test2.sql QUERY PLAN -- -- Hash Join (cost=6156.90..13541.14 rows=3028 width=206) (actual time=154.741..2334.366 rows=3149 loops=1) Hash Cond: (outer.category_id = inner.id) - Hash Join (cost=6148.68..13472.36 rows=3028 width=206) (actual time=153.801..2288.792 rows=3149 loops=1) Hash Cond: (outer.id = inner.element_id) - Seq Scan on element e (cost=0.00..4766.70 rows=252670 width=4) (actual time=0.022..1062.626 rows=252670 loops=1) - Hash (cost=6141.11..6141.11 rows=3028 width=206) (actual time=151.105..151.105 rows=3149 loops=1) - Seq Scan on ports p (cost=0.00..6141.11 rows=3028 width=206) (actual time=0.027..131.072 rows=3149 loops=1) Filter: (status = 'D'::bpchar) - Hash (cost=7.97..7.97 rows=97 width=4) (actual time=0.885..0.885 rows=97 loops=1) - Seq Scan on categories c (cost=0.00..7.97 rows=97 width=4) (actual time=0.076..0.476 rows=97 loops=1) Total runtime: 2346.877 ms (11 rows) freshports.org=# Without leaving enable_hashjoin = false, can you suggest a way to force the index usage? FYI, the query is: explain analyse SELECT P.id, P.category_id, P.version as version, P.revisionas revision, P.element_id, P.maintainer, P.short_description, to_char(P.date_added - SystemTimeAdjust(), 'DD Mon HH24:MI:SS') as date_added, P.last_commit_id as last_change_log_id, P.package_exists, P.extract_suffix, P.homepage, P.status, P.broken, P.forbidden, P.ignore, P.restricted, P.deprecated, P.no_cdrom, P.expiration_date, P.latest_link FROM categories C, ports P JOIN element E on P.element_id = E.id WHERE P.status = 'D' AND P.category_id = C.id; I doubt it would make a difference but if you: ... FROM categories C JOIN ports P on P.category_id=C.id JOIN element E on P.element_id = E.id WHERE P.status = 'D'; does it change anything? -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] FW: Index usage
Hi, what do you mean by increasing the statistics on the date column? We never had any upgrade on it. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of gnari Sent: Thursday, November 25, 2004 3:13 AM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] Yes, the database is being vacuum-ed and analyzed on a daily basis. then you should consider increating the statistics on the date column, as the estimates were a bit off in the plan Our version is 7.2.1 upgrade time ? gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] FW: Index usage
From: BBI Edwin Punzalan [EMAIL PROTECTED] Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; Our version is 7.2.1 upgrade time ? We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] FW: Index usage
Thanks but whatever it does, it didn't work. :D Do you think upgrading will fix this problem? = db=# alter table chatlogs alter column date set statistics 300; ALTER db=# analyze chatlogs; ANALYZE db=# explain analyze select * from chatlogs where date = '12/1/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 rows=3357 width=212) (actual time=22.14..138.53 rows=1312 loops=1) Total runtime: 139.42 msec EXPLAIN morphTv=# explain analyze select * from chatlogs where date = '11/03/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: 13573.70 msec EXPLAIN = -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 10:08 AM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; Our version is 7.2.1 upgrade time ? We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] FW: Index usage
If it's any help, i just ran this test on 7.4.6, my table has about 700 rows and the index is an integer. The item id ranges from 1 to 2. As you can see from the following plans, the optimizer changed it's plan depending on the value of the item id condition, and will use an index when it determines that the number of values that will be returned is a low % of the total table size. The item_id is an integer, but It looked like you are using a character field to store date information. Also, the dates you entered in your test case seem to be in the format DD/MM/YY which won't be amenable to useful comparative searching (I didn't read any of the earlier posts so if that isn't the case, just ignore this). If this is the case, try storing the data in a date column and see what happens then. regards Iain test=# explain analyse select * from bigtable where item_id = 1000; QUERY PLAN --- Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 rows=59553 width=80) (actual time=0.069..704.401 rows=58102 loops=1) Index Cond: ((item_id)::integer = 1000) Total runtime: 740.786 ms (3 rows) test=# explain analyse select * from bigtable where item_id = 1; QUERY PLAN --- --- Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 width=80) (actual time=0.027..18599.032 rows=71 14844 loops=1) Filter: ((item_id)::integer = 1) Total runtime: 23024.986 ms - Original Message - From: BBI Edwin Punzalan [EMAIL PROTECTED] To: 'gnari' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 11:33 AM Subject: Re: [PERFORM] FW: Index usage Thanks but whatever it does, it didn't work. :D Do you think upgrading will fix this problem? = db=# alter table chatlogs alter column date set statistics 300; ALTER db=# analyze chatlogs; ANALYZE db=# explain analyze select * from chatlogs where date = '12/1/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 rows=3357 width=212) (actual time=22.14..138.53 rows=1312 loops=1) Total runtime: 139.42 msec EXPLAIN morphTv=# explain analyze select * from chatlogs where date = '11/03/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: 13573.70 msec EXPLAIN = -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 10:08 AM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; Our version is 7.2.1 upgrade time ? We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] FW: Index usage
Hi. Thanks for your reply. The date column data type is date already. :D -Original Message- From: Iain [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 12:00 PM To: BBI Edwin Punzalan; 'gnari'; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage If it's any help, i just ran this test on 7.4.6, my table has about 700 rows and the index is an integer. The item id ranges from 1 to 2. As you can see from the following plans, the optimizer changed it's plan depending on the value of the item id condition, and will use an index when it determines that the number of values that will be returned is a low % of the total table size. The item_id is an integer, but It looked like you are using a character field to store date information. Also, the dates you entered in your test case seem to be in the format DD/MM/YY which won't be amenable to useful comparative searching (I didn't read any of the earlier posts so if that isn't the case, just ignore this). If this is the case, try storing the data in a date column and see what happens then. regards Iain test=# explain analyse select * from bigtable where item_id = 1000; QUERY PLAN --- Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 rows=59553 width=80) (actual time=0.069..704.401 rows=58102 loops=1) Index Cond: ((item_id)::integer = 1000) Total runtime: 740.786 ms (3 rows) test=# explain analyse select * from bigtable where item_id = 1; QUERY PLAN --- --- Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 width=80) (actual time=0.027..18599.032 rows=71 14844 loops=1) Filter: ((item_id)::integer = 1) Total runtime: 23024.986 ms - Original Message - From: BBI Edwin Punzalan [EMAIL PROTECTED] To: 'gnari' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 11:33 AM Subject: Re: [PERFORM] FW: Index usage Thanks but whatever it does, it didn't work. :D Do you think upgrading will fix this problem? = db=# alter table chatlogs alter column date set statistics 300; ALTER db=# analyze chatlogs; ANALYZE db=# explain analyze select * from chatlogs where date = '12/1/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 rows=3357 width=212) (actual time=22.14..138.53 rows=1312 loops=1) Total runtime: 139.42 msec EXPLAIN morphTv=# explain analyze select * from chatlogs where date = '11/03/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: 13573.70 msec EXPLAIN = -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 10:08 AM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; Our version is 7.2.1 upgrade time ? We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] FW: Index usage
Sorry, i can't check this easily as I don't have any date fields in my data (they all held has character strings - do as i say, not as i do) but maybe you should cast or convert the string representation of the date to a date in the where clause. Postgres might be doing some implicit conversion but if it is, I'd expect it to use a -MM-DD format which is what I see here. Something like ... WHERE date= to_date('11/03/04','DD/MM/YY') regards Iain - Original Message - From: BBI Edwin Punzalan [EMAIL PROTECTED] To: 'Iain' [EMAIL PROTECTED]; 'gnari' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 1:05 PM Subject: RE: [PERFORM] FW: Index usage Hi. Thanks for your reply. The date column data type is date already. :D -Original Message- From: Iain [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 12:00 PM To: BBI Edwin Punzalan; 'gnari'; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage If it's any help, i just ran this test on 7.4.6, my table has about 700 rows and the index is an integer. The item id ranges from 1 to 2. As you can see from the following plans, the optimizer changed it's plan depending on the value of the item id condition, and will use an index when it determines that the number of values that will be returned is a low % of the total table size. The item_id is an integer, but It looked like you are using a character field to store date information. Also, the dates you entered in your test case seem to be in the format DD/MM/YY which won't be amenable to useful comparative searching (I didn't read any of the earlier posts so if that isn't the case, just ignore this). If this is the case, try storing the data in a date column and see what happens then. regards Iain test=# explain analyse select * from bigtable where item_id = 1000; QUERY PLAN --- Index Scan using d_bigtable_idx2 on bigtable (cost=0.00..118753.57 rows=59553 width=80) (actual time=0.069..704.401 rows=58102 loops=1) Index Cond: ((item_id)::integer = 1000) Total runtime: 740.786 ms (3 rows) test=# explain analyse select * from bigtable where item_id = 1; QUERY PLAN --- --- Seq Scan on d_hi_mise_item_uri (cost=0.00..194285.15 rows=7140589 width=80) (actual time=0.027..18599.032 rows=71 14844 loops=1) Filter: ((item_id)::integer = 1) Total runtime: 23024.986 ms - Original Message - From: BBI Edwin Punzalan [EMAIL PROTECTED] To: 'gnari' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 11:33 AM Subject: Re: [PERFORM] FW: Index usage Thanks but whatever it does, it didn't work. :D Do you think upgrading will fix this problem? = db=# alter table chatlogs alter column date set statistics 300; ALTER db=# analyze chatlogs; ANALYZE db=# explain analyze select * from chatlogs where date = '12/1/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_type_idx on chatlogs (cost=0.00..6053.61 rows=3357 width=212) (actual time=22.14..138.53 rows=1312 loops=1) Total runtime: 139.42 msec EXPLAIN morphTv=# explain analyze select * from chatlogs where date = '11/03/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..27252.86 rows=271882 width=212) (actual time=12.24..13419.36 rows=257137 loops=1) Total runtime: 13573.70 msec EXPLAIN = -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 10:08 AM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] Hi, what do you mean by increasing the statistics on the date column? alter table chatlogs alter column date set statistics 300; analyze chatlogs; Our version is 7.2.1 upgrade time ? We never had any upgrade on it. 7.2 is a bit dated now that 8.0 is in beta if you want to stay with 7.2, you should at least upgrade to the latest point release (7.2.6 ?), as several serious bugs have been fixed gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] FW: Index usage
From: BBI Edwin Punzalan [EMAIL PROTECTED] db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec == How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] FW: Index usage
Yes, the database is being vacuum-ed and analyzed on a daily basis. Our version is 7.2.1 -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 4:35 PM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec == How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] FW: Index usage
From: BBI Edwin Punzalan [EMAIL PROTECTED] Yes, the database is being vacuum-ed and analyzed on a daily basis. then you should consider increating the statistics on the date column, as the estimates were a bit off in the plan Our version is 7.2.1 upgrade time ? gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] FW: Index usage
Well you just selected a whole lot more rows... What's the total number of rows in the table? In general, what I remember from reading on the list, is that when there's no upper bound on a query like this, the planner is more likely to choose a seq. scan than an index scan. Try to give your query an upper bound like: select date from chatlogs where date='11/23/04' and date '12/31/99'; select date from chatlogs where date='10/23/04' and date '12/31/99'; This should make it easier for the planner to give a proper estimate of the number of rows returned. If it doesn't help yet, please post 'explain analyze' output rather than 'explain' output, for it allows much better investigation into why the planner chooses what it chooses. cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of BBI Edwin Punzalan Sent: Wednesday, November 24, 2004 7:52 AM To: [EMAIL PROTECTED] Subject: [PERFORM] FW: Index usage Hi everyone, Can anyone please explain postgres' behavior on our index. I did the following query tests on our database: db=# create index chatlogs_date_idx on chatlogs (date); CREATE db=# explain select date from chatlogs where date='11/23/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..144.11 rows=36 width=4) EXPLAIN db=# explain select date from chatlogs where date='10/23/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..23938.06 rows=253442 width=4) EXPLAIN Date's datatype is date. Its just odd that I just change the actual date of search and the index is not being used anymore. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] No index usage with left join
Cannot you do a cast in your query? Does that help with using the indexes? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Sent: maandag 2 augustus 2004 14:09 To: [EMAIL PROTECTED] Subject: [PERFORM] No index usage with left join We have a companies and a contacts table with about 3000 records each. We run the following SQL-Command which runs about 2 MINUTES !: SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) comfire= explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; NOTICE: QUERY PLAN: Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual time=40939.38..40939.38 rows=1 loops=1) - Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual time=0.05..40930.14 rows=2866 loops=1) - Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 width=7) (actual time=0.01..18.10 rows=2866 loops=1) - Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 width=8) (actual time=0.03..6.25 rows=2751 loops=2866) Total runtime: 40939.52 msec EXPLAIN Note: - We need the left join because we need all contacts even if they are not assigned to a company - We are not able to change the datatypes of the joined fields because we use a standard software (btw who cares: SuSE Open Exchange Server) - When we use a normal join (without LEFT or a where clause) the SQL runs immediately using the indexes How can I force the usage of the indexes when using left join. Or any other SQL construct that does the same !? Can anybody please give us a hint !? Thanks in forward. Greetings Achim ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] No index usage with
TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Greetz, Guido Cannot you do a cast in your query? Does that help with using the indexes? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED] Sent: maandag 2 augustus 2004 14:09 To: [EMAIL PROTECTED] Subject: [PERFORM] No index usage with left join We have a companies and a contacts table with about 3000 records each. We run the following SQL-Command which runs about 2 MINUTES !: SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) comfire= explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; NOTICE: QUERY PLAN: Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual time=40939.38..40939.38 rows=1 loops=1) - Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual time=0.05..40930.14 rows=2866 loops=1) - Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 width=7) (actual time=0.01..18.10 rows=2866 loops=1) - Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 width=8) (actual time=0.03..6.25 rows=2751 loops=2866) Total runtime: 40939.52 msec EXPLAIN Note: - We need the left join because we need all contacts even if they are not assigned to a company - We are not able to change the datatypes of the joined fields because we use a standard software (btw who cares: SuSE Open Exchange Server) - When we use a normal join (without LEFT or a where clause) the SQL runs immediately using the indexes How can I force the usage of the indexes when using left join. Or any other SQL construct that does the same !? Can anybody please give us a hint !? Thanks in forward. Greetings Achim ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] No index usage with
G u i d o B a r o s i o wrote: TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match And this is fixed in 7.5/8.0. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] No index usage with left join
SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) snip How can I force the usage of the indexes when using left join. Or any other SQL construct that does the same !? Can anybody please give us a hint !? You really don't need to use indexes since you're fetching all information from both tables. Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would likely choose a far better plan -- hash join rather than nested loop) as it won't join a bigint to a text field without a cast. Try this: set enable_nestloop = false; SELECT count(*) FROM contacts LEFT JOIN companies ON cast(contacts.sid as bigint) = companies.intfield01; set enable_nestloop = true; ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] No index usage with left join
Rod Taylor [EMAIL PROTECTED] writes: How can I force the usage of the indexes when using left join. Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would likely choose a far better plan -- hash join rather than nested loop) Indeed, the lack of any join-condition line in the EXPLAIN output implies it's 7.2 or older. IIRC 7.4 is the first release that is capable of using merge or hash join with a condition more complicated than plain Var = Var. In this case, since the two fields are of different datatypes, the planner sees something like Var = Var::text (ie, there's an inserted cast function). 7.2 will just say duh, too complicated for me and generate a nestloop. With the columns being of different datatypes, you don't even have a chance for an inner indexscan in the nestloop. In short: change the column datatypes to be the same, or update to 7.4.something. There are no other solutions. (Well, if you were really desperate you could create a set of mergejoinable text op bigint comparison operators, and then 7.2 would be able to cope; but I should think that updating to 7.4 would be much less work.) regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] No index usage with left join
On Mon, 2004-08-02 at 06:08, [EMAIL PROTECTED] wrote: We have a companies and a contacts table with about 3000 records each. We run the following SQL-Command which runs about 2 MINUTES !: SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) comfire= explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; NOTICE: QUERY PLAN: Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual time=40939.38..40939.38 rows=1 loops=1) - Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual time=0.05..40930.14 rows=2866 loops=1) - Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 width=7) (actual time=0.01..18.10 rows=2866 loops=1) - Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 width=8) (actual time=0.03..6.25 rows=2751 loops=2866) Total runtime: 40939.52 msec EXPLAIN Note: - We need the left join because we need all contacts even if they are not assigned to a company - We are not able to change the datatypes of the joined fields because we use a standard software (btw who cares: SuSE Open Exchange Server) - When we use a normal join (without LEFT or a where clause) the SQL runs immediately using the indexes How can I force the usage of the indexes when using left join. Or any other SQL construct that does the same !? Can anybody please give us a hint !? Why in the world would the database use the index in this case? You're retrieving every single row, so it may as well hit the data store directly. By the way, unlike many other databases that can just hit the index, PostgreSQL always has to go back to the data store anyway to get the real value, so if it's gonna hit more than some small percentage of rows, it's usually a win to just seq scan it. Try restricting your query with a where clause to one or two rows and see what you get. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] no index-usage on aggregate-functions?
@Chris: SELECT count(*) FROM the_table = Seq scan - takes about 12 sec This cannot be made O(1) in postgres due to MVCC. You just have to live with it. bad news BTW: in this case you could workaround select reltuples from pg_class where relname='the_table' (yes, I know: presumes a regular vacuum analyse) Average and sum can never use an index AFAIK, in any db server. You need information from every row. Take a look at the SQLSrv-pendant: create index x_1 on the_table (num_found) select avg(num_found) from the_table - Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) (I'm not sure what Oracle does - have to re-install it first ...) @Scott: Yes, you're expecting an MVCC database to behave like a row locking database. h... So, it seems that PG is not s well suited for a datawarehouse and/or performing extensive statistics/calculations/reportings on large tables, is it? Greetings Harald ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] no index-usage on aggregate-functions?
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote: Average and sum can never use an index AFAIK, in any db server. You need information from every row. Take a look at the SQLSrv-pendant: create index x_1 on the_table (num_found) select avg(num_found) from the_table - Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) But is it really faster is the question? This sum needs all the values in that column. As far as I know it uses the index because it uses less space on disk and thus is a little faster due to less IO. In pg the index doesn't work like that, so in pg it's faster to sum all values using the table itself. If you have a WHERE clause to only sum some values, then pg will use an index (if applicable) and you will see a speedup. For min and max the situation is different, there an index can give you the answer without scanning all rows. For that the workaround exist in pg. The pg aggregate functions are very general and no one have special cased min/max yet. Until that happen the work around works and is fast. So, it seems that PG is not s well suited for a datawarehouse and/or performing extensive statistics/calculations/reportings on large tables, is it? I don't see how you can say that from your example. Just because it uses an index for the sum above does not mean that it is a lot faster. It still have to do as many additions as pg has to do. Sure, mvcc is best when you have both read and writes. But it should still be comparable in speed even if you only do reads. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] no index-usage on aggregate-functions?
On Tue, Jun 29, 2004 at 10:46:27 +0200, Harald Lau (Sector-X) [EMAIL PROTECTED] wrote: h... So, it seems that PG is not s well suited for a datawarehouse and/or performing extensive statistics/calculations/reportings on large tables, is it? If you are doing lots of selects of aggregates relative to the number of updates, you can cache the values of interest in derived tables and use triggers to keep those tables up to date. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] no index-usage on aggregate-functions?
On Tue, 2004-06-29 at 02:46, Harald Lau (Sector-X) wrote: @Chris: SELECT count(*) FROM the_table = Seq scan - takes about 12 sec This cannot be made O(1) in postgres due to MVCC. You just have to live with it. bad news BTW: in this case you could workaround select reltuples from pg_class where relname='the_table' (yes, I know: presumes a regular vacuum analyse) Note that there ARE other options. While the inability to provide a speedy count is a cost of using an MVCC system, the ability to allow thousands of readers to run while updates are happening underneath them more than makes up for the slower aggregate performance. The other options to this problem involve maintaining another table that has a single (visible) row that is maintained by a trigger on the main table that fires and updates that single row to reflect the count of the table. This is costly on updates, but may be worth doing for certain situations. Personally, I haven't had a great need to do a count(*) on my tables that much. And on large tables, approximations are usually fine. Average and sum can never use an index AFAIK, in any db server. You need information from every row. Take a look at the SQLSrv-pendant: create index x_1 on the_table (num_found) select avg(num_found) from the_table - Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) (I'm not sure what Oracle does - have to re-install it first ...) There's a good chance Oracle can use the index too. That's because both Oracle is still a row locked database at heart. It's MVCC system sits on top of it in roll back segments. So, the main store is serialized and can be indexed, while the updates live in the rollback segment. This, however, is not paradise. This limits Oracle's performance for things like long running transactions and makes it slower as the amount of information in the rollback segment grows. Meanwhile, PostgreSQL uses an in store MVCC mechanism. This system means that all index accesses must then hit the actual MVCC storage, since indexes aren't easily serialized. @Scott: Yes, you're expecting an MVCC database to behave like a row locking database. h... So, it seems that PG is not s well suited for a datawarehouse and/or performing extensive statistics/calculations/reportings on large tables, is it? On the contrary, it makes it GREAT for datawarehousing. Not because any one child process will be super fast, but because ALL the child processes will run reasonably fast, even under very heavy read and write load. Note that if you've got the memory for the hash agg algo to fire into shared memory, it's pretty darned fast now, so if the data (mostly) fit into kernel cache you're gold. And 12 gig Intel boxes aren't that expensive, compared to an Oracle license. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] no index-usage on aggregate-functions?
Note that there ARE other options. While the inability to provide a speedy count is a cost of using an MVCC system, the ability to allow thousands of readers to run while updates are happening underneath them more than makes up for the slower aggregate performance. IMO this depends on the priority of your application resp. the customers intentions and wishes This, however, is not paradise. you can't have it all ;-) On the contrary, it makes it GREAT for datawarehousing. Not because any one child process will be super fast, but because ALL the child processes will run reasonably fast, even under very heavy read and write load. What I meant with datawarehouse are many db's at many locations whose data are to be collected in one central db in order to mix em up, sum up or do anything equivalent. But in fact my quite heavy-read/write-accessed db is running really fast since 1 1/2 years now Even though still on PG 7.2 The one and only bottleneck are the statistics and the reports - and the tables are getting larger and larger ... Note that if you've got the memory for the hash agg algo to fire into shared memory, it's pretty darned fast now, yes, I've noticed here on the testing server so if the data (mostly) fit into kernel cache you're gold. And 12 gig Intel boxes aren't that expensive, compared to an Oracle license. *that's* the point ... Anyway: Greetings and thanks for your answers Harald ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html