Re: [PERFORM] Query using SeqScan instead of IndexScan
On Apr 2, 2006, at 6:30 PM, Josh Berkus wrote: But just as a follow up question to your #1 suggestion, I have 8 GB of ram in my production server. You're saying to set the effective_cache_size then to 5 GB roughly? Somewhere around 655360? Currently it is set to 65535. Is that something that's OS dependent? I'm not sure how much memory my server sets aside for disk caching. Yes, about. It's really a judgement call; you're looking for the approximate combined RAM available for disk caching and shared mem. However, this is just used as a way of estimating the probability that the data you want is cached in memory, so you're just trying to be order-of-magnitude accurate, not to-the-MB accurate. FWIW, I typically set effective_cache_size to the amount of memory in the machine minus 1G for the OS and various other daemons, etc. But as Josh said, as long as your somewhere in the ballpark it's probably good enough. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query using SeqScan instead of IndexScan
On Apr 1, 2006, at 12:51 PM, Brendan Duddridge wrote: from SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id' I find correlation on category_product for category_id is 0.643703 Would setting the index on category_id to be clustered help with this? It would absolutely help on the query in question. In my experience, a correlation of 0.64 is too low to allow an index scan to be used for anything but a tiny number of rows. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Query using SeqScan instead of IndexScan
Brendan, But just as a follow up question to your #1 suggestion, I have 8 GB of ram in my production server. You're saying to set the effective_cache_size then to 5 GB roughly? Somewhere around 655360? Currently it is set to 65535. Is that something that's OS dependent? I'm not sure how much memory my server sets aside for disk caching. Yes, about. It's really a judgement call; you're looking for the approximate combined RAM available for disk caching and shared mem. However, this is just used as a way of estimating the probability that the data you want is cached in memory, so you're just trying to be order-of-magnitude accurate, not to-the-MB accurate. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
Hi Josh, Thanks. I've adjusted my effective_cache_size to 5 GB, so we'll see how that goes. I'm also doing some query and de-normalization optimizations so we'll see how those go too. Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 2, 2006, at 4:30 PM, Josh Berkus wrote: Brendan, But just as a follow up question to your #1 suggestion, I have 8 GB of ram in my production server. You're saying to set the effective_cache_size then to 5 GB roughly? Somewhere around 655360? Currently it is set to 65535. Is that something that's OS dependent? I'm not sure how much memory my server sets aside for disk caching. Yes, about. It's really a judgement call; you're looking for the approximate combined RAM available for disk caching and shared mem. However, this is just used as a way of estimating the probability that the data you want is cached in memory, so you're just trying to be order-of-magnitude accurate, not to-the-MB accurate. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Query using SeqScan instead of IndexScan
On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html correlation: Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a operator.) In other words, the following will have a correlation of 1: 1 2 3 ... 998 999 1000 And this is -1... 1000 999 ... 2 1 While this would have a very low correlation: 1 1000 2 999 ... The lower the correlation, the more expensive an index scan is, because it's more random. As I mentioned, I believe that the current index scan cost estimator is flawed though, because it will bias heavily against correlations that aren't close to 1 or -1. So, what does SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id'; show? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
Hi Jim, from SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id' I find correlation on category_product for category_id is 0.643703 Would setting the index on category_id to be clustered help with this? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html correlation: Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a operator.) In other words, the following will have a correlation of 1: 1 2 3 ... 998 999 1000 And this is -1... 1000 999 ... 2 1 While this would have a very low correlation: 1 1000 2 999 ... The lower the correlation, the more expensive an index scan is, because it's more random. As I mentioned, I believe that the current index scan cost estimator is flawed though, because it will bias heavily against correlations that aren't close to 1 or -1. So, what does SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id'; show? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Query using SeqScan instead of IndexScan
On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. Geez, I'm going well this week ;) Thanks for the detailed info. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
On 4/2/06, chris smith [EMAIL PROTECTED] wrote: On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. rant It'd be nice if the database developers agreed on what terms meant. http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table. /rant A work colleague found that information a few weeks ago so that's where my misunderstanding came from - if I'm reading that right they use n_distinct as their cardinality basis.. then again I could be reading that completely wrong too. I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! This week's task - stop reading mysql documentation. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
chris smith wrote: I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! This week's task - stop reading mysql documentation. You don't _have_ to believe Postgres -- this is stuff taught in any statistics course. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
chris smith wrote: rant It'd be nice if the database developers agreed on what terms meant. http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table. /rant A work colleague found that information a few weeks ago so that's where my misunderstanding came from - if I'm reading that right they use n_distinct as their cardinality basis.. then again I could be reading that completely wrong too. Yeah that's right - e.g using the same table in postgres and mysql: pgsql SELECT attname,n_distinct,correlation FROM pg_stats WHERE tablename='fact0' AND attname LIKE 'd%key'; attname | n_distinct | correlation -++- d0key | 1 | -0.0211169 d1key |100 |0.124012 d2key | 10 |0.998393 (3 rows) mysql SHOW INDEX FROM fact0 - ; +---++-+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name| Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++-+--+-+---+-+--++--++-+ | fact0 | 1 | fact0_d0key |1 | d0key | A | 1 | NULL | NULL | | BTREE | | | fact0 | 1 | fact0_d1key |1 | d1key | A | 100 | NULL | NULL | | BTREE | | | fact0 | 1 | fact0_d2key |1 | d2key | A | 10 | NULL | NULL | | BTREE | | +---++-+--+-+---+-+--++--++-+ 3 rows in set (0.00 sec) It is a bit confusing - '(distinct) cardinality' might be a better heading for their 'cardinality' column! On the correlation business - I don't think Mysql calculates it (or if it does, its not displayed). I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! Well yes - however, to be fair to the Mysql guys, AFAICS the capture and display of index stats (and any other optimizer related data) is not part of any standard. Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query using SeqScan instead of IndexScan
Mark Kirkwood [EMAIL PROTECTED] writes: It is a bit confusing - '(distinct) cardinality' might be a better heading for their 'cardinality' column! The usual mathematical meaning of cardinality is the number of members in a set. That isn't real helpful for the point at hand, because the mathematical definition of a set disallows duplicate members, so if you're dealing with non-unique values you could argue it either way about whether to count duplicates or not. However, I read in the SQL99 spec (3.1 Definitions) d) cardinality (of a value of a collection type): The number of elements in that value. Those elements need not necessarily have distinct values. so ... as all too often ... the mysql boys have not got a clue about standards compliance. They are using this term in the opposite way from how the SQL committee uses it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
What's the correlation of category_id? The current index scan cost estimator places a heavy penalty on anything with a correlation much below about 90%. On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: Hi, I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan. Here's my before and after. Before: ssdev=# SET enable_seqscan TO DEFAULT; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: (outer.product_id = inner.product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) - Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms (7 rows) After: ssdev=# SET enable_seqscan TO false; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: (outer.product_id = pav.product_id) Total runtime: 74.747 ms (6 rows) There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms. Any ideas what I can do to improve this without turning sequential scanning off? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
Hi Jim, I'm not quite sure what you mean by the correlation of category_id? The category_id is part of a compound primary key in the category_product table. The primary key on category_product is (category_id, product_id). Here's the definitions of the two tables involved in the join: Table public.category_product Column| Type | Modifiers -+--+--- category_id | integer | not null product_id | integer | not null en_name_sort_order | integer | fr_name_sort_order | integer | merchant_sort_order | integer | price_sort_order| integer | merchant_count | integer | is_active | character varying(5) | Indexes: x_category_product_pk PRIMARY KEY, btree (category_id, product_id) category_product__is_active_idx btree (is_active) category_product__merchant_sort_order_idx btree (merchant_sort_order) x_category_product__category_id_fk_idx btree (category_id) CLUSTER x_category_product__product_id_fk_idx btree (product_id) Foreign-key constraints: x_category_product_category_fk FOREIGN KEY (category_id) REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED x_category_product_product_fk FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED Table public.product_attribute_value Column | Type | Modifiers +---+--- attribute_id | integer | not null attribute_unit_id | integer | attribute_value_id | integer | boolean_value | character varying(5) | decimal_value | numeric(30,10)| product_attribute_value_id | integer | not null product_id | integer | not null product_reference_id | integer | status_code| character varying(32) | Indexes: product_attribute_value_pk PRIMARY KEY, btree (product_attribute_value_id) product_attribute_value__attribute_id_fk_idx btree (attribute_id) product_attribute_value__attribute_unit_id_fk_idx btree (attribute_unit_id) product_attribute_value__attribute_value_id_fk_idx btree (attribute_value_id) product_attribute_value__product_id_fk_idx btree (product_id) product_attribute_value__product_reference_id_fk_idx btree (product_reference_id) Foreign-key constraints: product_attribute_value_attribute_fk FOREIGN KEY (attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE INITIALLY DEFERRED product_attribute_value_attributeunit_fk FOREIGN KEY (attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id) DEFERRABLE INITIALLY DEFERRED product_attribute_value_attributevalue_fk FOREIGN KEY (attribute_value_id) REFERENCES attribute_value(attribute_value_id) DEFERRABLE INITIALLY DEFERRED product_attribute_value_product_fk FOREIGN KEY (product_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED product_attribute_value_productreference_fk FOREIGN KEY (product_reference_id) REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED Not sure if that helps answer your question, but the query is pretty slow. Sometimes it takes 5 - 15 seconds depending on the category_id specified. Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote: What's the correlation of category_id? The current index scan cost estimator places a heavy penalty on anything with a correlation much below about 90%. On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: Hi, I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan. Here's my before and after. Before: ssdev=# SET enable_seqscan TO DEFAULT; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - --- - --- -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: (outer.product_id = inner.product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
Re: [PERFORM] Query using SeqScan instead of IndexScan
On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). select category_id, count(*) from category_product group by category_id; will show you how many category_id's there are and how many products are in each category. Having a lot of products in one category (or having a small amount of categories) can slow things down because the db can't use the index effectively.. which might be what you're seeing (hence why it's fast for some categories, slow for others). On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote: What's the correlation of category_id? The current index scan cost estimator places a heavy penalty on anything with a correlation much below about 90%. On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: Hi, I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan. Here's my before and after. Before: ssdev=# SET enable_seqscan TO DEFAULT; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - --- - --- -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: (outer.product_id = inner.product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) - Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms (7 rows) After: ssdev=# SET enable_seqscan TO false; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - --- - --- - Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: (outer.product_id = pav.product_id) Total runtime: 74.747 ms (6 rows) There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms. Any ideas what I can do to improve this without turning sequential scanning off? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query using SeqScan instead of IndexScan
Ah I see. Ok, well we have a very wide variety here... category_id | count -+--- 1000521 | 31145 1001211 | 22991 1001490 | 22019 1001628 | 12472 146 | 10480 187 | 10338 1001223 | 10020 1001560 | 9532 1000954 | 8633 1001314 | 8191 1001482 | 8140 1001556 | 7959 1001481 | 7850 [snip...] 1001133 | 1 1000532 | 1 1000691 | 1 1000817 | 1 1000783 | 1 1000689 | 1 (1157 rows) So what's the best kind of query to handle this kind of data to make it fast in all cases? I'd like get down to sub-second response times. currently we have: select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 31, 2006, at 6:23 PM, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). select category_id, count(*) from category_product group by category_id; will show you how many category_id's there are and how many products are in each category. Having a lot of products in one category (or having a small amount of categories) can slow things down because the db can't use the index effectively.. which might be what you're seeing (hence why it's fast for some categories, slow for others). On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote: What's the correlation of category_id? The current index scan cost estimator places a heavy penalty on anything with a correlation much below about 90%. On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: Hi, I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan. Here's my before and after. Before: ssdev=# SET enable_seqscan TO DEFAULT; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN --- -- --- --- -- --- -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: (outer.product_id = inner.product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) - Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms (7 rows) After: ssdev=# SET enable_seqscan TO false; ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN --- -- --- --- -- --- - Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: (outer.product_id = pav.product_id) Total runtime: 74.747 ms (6 rows) There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms. Any ideas what I can do to improve this without turning sequential scanning off? Thanks, ___ _ Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork:
[PERFORM] Query using SeqScan instead of IndexScan
Hi,I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan.Here's my before and after.Before:ssdev=# SET enable_seqscan TO DEFAULT;ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) - Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms(7 rows)After:ssdev=# SET enable_seqscan TO false;ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: ("outer".product_id = pav.product_id) Total runtime: 74.747 ms(6 rows)There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms.Any ideas what I can do to improve this without turning sequential scanning off?Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Query using SeqScan instead of IndexScan
Oops. I forgot to mention that I was using PostgreSQL 8.1.3 on Mac OS X.Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 29, 2006, at 8:12 PM, Brendan Duddridge wrote:Hi,I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan.Here's my before and after.Before:ssdev=# SET enable_seqscan TO DEFAULT;ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN -- Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) - Seq Scan on product_attribute_value pav (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1) - Hash (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1) Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms(7 rows)After:ssdev=# SET enable_seqscan TO false;ssdev=# explain analyze select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; QUERY PLAN - Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1) - Index Scan using x_category_product__category_id_fk_idx on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1) Index Cond: (category_id = 1001082) - Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140) Index Cond: ("outer".product_id = pav.product_id) Total runtime: 74.747 ms(6 rows)There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms.Any ideas what I can do to improve this without turning sequential scanning off?Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Query using SeqScan instead of IndexScan
Brenden, Any ideas what I can do to improve this without turning sequential scanning off? Hmmm, looks like your row estimates are good. Which means it's probably your postgresql.conf parameters which are off. Try the following, in the order below: 1) Raise effective_cache_size to 2/3 of your RAM (remember that ecs is in 8k pages). Test again. 2) Multiply all of the cpu_* costs by 0.3. Test again. 3) Lower random_page_cost by steps to 3.5, then 3.0, then 2.5, then 2.0, testing each time. These are all runtime-settable parameters, so you can test them in one query window, then set them in the main postgresql.conf if they work. -- Josh Berkus Sun Microsystems San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match