Re: [PERFORM] Estimation problem with a LIKE clause containing a /
On 11/7/07, Tom Lane [EMAIL PROTECTED] wrote: Hmmm ... what locale are you working in? I notice that the range estimator for this pattern would be ancestors = '1062/' AND ancestors '10620', which will do the right thing in C locale but maybe not so much elsewhere. Sorry for not having mentioned it before. Locale is UTF-8. Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, You'd probably get better results with 8.2, which has a noticeably smarter LIKE-estimator, at least for histogram sizes of 100 or more. It's not really possible to upgrade this application to 8.2 for now. It's a very old app based on the thing formerly called as Red Hat WAF and now known as APLAWS and validating WAF and this application with 8.2 will take quite some time. Moreover the db is big and we can't afford the downtime of a migration. I suppose my best bet is to remove the pg_statistic line and to set the statistics to 0 for this column so that the stats are never generated again for this column? Thanks, -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
Guillaume Smet [EMAIL PROTECTED] writes: [ bad estimate for LIKE ] Hmmm ... what locale are you working in? I notice that the range estimator for this pattern would be ancestors = '1062/' AND ancestors '10620', which will do the right thing in C locale but maybe not so much elsewhere. Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, You'd probably get better results with 8.2, which has a noticeably smarter LIKE-estimator, at least for histogram sizes of 100 or more. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Estimation problem with a LIKE clause containing a /
Hi all, While studying a query taking forever after an ANALYZE on a never analyzed database (a bad estimate causes a nested loop on a lot of tuples), I found the following problem: - without any stats (I removed the line from pg_statistic): ccm_prod_20071106=# explain analyze select * from cms_items where ancestors LIKE '1062/%'; QUERY PLAN -- Seq Scan on cms_items (cost=0.00..689.26 rows=114 width=587) (actual time=0.008..21.692 rows=11326 loops=1) Filter: ((ancestors)::text ~~ '1062/%'::text) Total runtime: 31.097 ms - the estimate is bad (it's expected) but it's sufficient to prevent the nested loop so it's my current workaround - after analyzing the cms_items table (statistics is set to 10 but it's exactly the same for 100): ccm_prod_20071106=# explain analyze select * from cms_items where ancestors LIKE '1062/%'; QUERY PLAN Seq Scan on cms_items (cost=0.00..689.26 rows=*1* width=103) (actual time=0.010..22.024 rows=11326 loops=1) Filter: ((ancestors)::text ~~ '1062/%'::text) Total runtime: 31.341 ms - this estimate leads PostgreSQL to choose a nested loop which is executed more than 11k times and causes the query to take forever. - if I remove the / from the LIKE clause (which I can't as ancestors is more or less a path): ccm_prod_20071106=# explain analyze select * from cms_items where ancestors LIKE '1062%'; QUERY PLAN --- Seq Scan on cms_items (cost=0.00..689.26 rows=*9097* width=103) (actual time=0.043..25.251 rows=11326 loops=1) Filter: ((ancestors)::text ~~ '1062%'::text) Total runtime: 34.778 ms Which is a really good estimate. Is it something expected? The histogram does contain values beginning with '1062/' (5 out of 10) and the cms_items table has ~ 22k rows. Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3). I checked the release notes between 8.1.8 and 8.1.10 and I didn't find anything relevant to fix this problem. Thanks for any help. Regards, -- Guillaume ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
Guillaume Smet [EMAIL PROTECTED] writes: On 11/7/07, Tom Lane [EMAIL PROTECTED] wrote: Hmmm ... what locale are you working in? I notice that the range estimator for this pattern would be ancestors = '1062/' AND ancestors '10620', which will do the right thing in C locale but maybe not so much elsewhere. Sorry for not having mentioned it before. Locale is UTF-8. I wanted the locale (lc_collate), not the encoding. I suppose my best bet is to remove the pg_statistic line and to set the statistics to 0 for this column so that the stats are never generated again for this column? That would optimize this particular query and probably pessimize a lot of others. I have another LIKE-estimation bug to go look at today too; let me see if this one is fixable or not. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
On 11/7/07, Guillaume Smet [EMAIL PROTECTED] wrote: While studying a query taking forever after an ANALYZE on a never analyzed database (a bad estimate causes a nested loop on a lot of tuples), I found the following problem: [snip] Total runtime: 31.097 ms [snip] Total runtime: 31.341 ms [snip] Total runtime: 34.778 ms Which is a really good estimate. That's a difference of less than *three milliseconds* -- a difference probably way within the expected overhead of running explain analyze. Furthermore, all three queries use the same basic plan: a sequential scan with a filter. At any rate you're microbenchmarking in a way that is not useful to real-world queries. In what way are these timings a problem? Have you tried using an index which supports prefix searches? The text_pattern_ops operator class lets yo do this with a plain B-tree index: create index cms_items_ancestors_index on cms_items (ancestors text_pattern_ops); analyze cms_items; Now all like 'prefix%' queries should use the index. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
Alexander, On 11/7/07, Alexander Staubo [EMAIL PROTECTED] wrote: That's a difference of less than *three milliseconds* -- a difference probably way within the expected overhead of running explain analyze. Furthermore, all three queries use the same basic plan: a sequential scan with a filter. At any rate you're microbenchmarking in a way that is not useful to real-world queries. In what way are these timings a problem? If you read my previous email carefully, you'll see they aren't a problem: the problem is the estimation, not the timing. This is a self contained test case of a far more complex query which uses a bad plan containing a nested loop due to the bad estimate. Now all like 'prefix%' queries should use the index. Not when you retrieve 50% of this table of 22k rows but that's not my problem anyway. A seqscan is perfectly fine in this case. Thanks anyway. -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] index stat
On Mon, Nov 5, 2007 at 10:42 AM, in message [EMAIL PROTECTED], Campbell, Lance [EMAIL PROTECTED] wrote: How can I [. . .] get rid of some unnecessary indexes Here's what I periodically run to look for unused indexes: select relname, indexrelname from pg_stat_user_indexes where indexrelname not like '%_pkey' and idx_scan = 0 order by relname, indexrelname ; We omit the primary keys from the list (based on our naming convention) because they are needed to ensure integrity. -Kevin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
I wrote: Guillaume Smet [EMAIL PROTECTED] writes: [ bad estimate for LIKE ] Hmmm ... what locale are you working in? I notice that the range estimator for this pattern would be ancestors = '1062/' AND ancestors '10620', which will do the right thing in C locale but maybe not so much elsewhere. I've applied a patch that might help you: http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
On 11/7/07, Tom Lane [EMAIL PROTECTED] wrote: I wanted the locale (lc_collate), not the encoding. fr_FR.UTF-8 That would optimize this particular query and probably pessimize a lot of others. Sure but there aren't a lot of queries based on the ancestors field and if they are a bit slower, it's not a problem. However having a query taking forever is not acceptable as the content management app is unaccessible. So it can be an acceptable solution in this case, even if not perfect. -- Guillaume ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Estimation problem with a LIKE clause containing a /
On 11/8/07, Tom Lane [EMAIL PROTECTED] wrote: I've applied a patch that might help you: http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php Thanks. I'll build a RPM package tomorrow with this patch and let you know if it fixes the problem. -- Guillaume ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Gregory Stark wrote: Mark Kirkwood [EMAIL PROTECTED] writes: I spent today looking at getting this patch into a self contained state. Working against HEAD I'm getting bogged down in the PathKeyItem to PathKey/EquivalenceClass/EquivalenceMember(s) change. So I figured I'd divide and conquer to some extent, and initially provide a patch: - against 8.2.(5) - self contained (i.e no mystery functions) That would be helpful for me. It would include the bits I'm looking for. The next step would be to update to to HEAD. That would hopefully provide some useful material for others working on this. If that's not too much work then that would be great but if it's a lot of work then it may not be worth it if I'm planning to only take certain bits. On the other hand if it's good then we might just want to take it wholesale and then add to it. Here is a (somewhat hurried) self-contained version of the patch under discussion. It applies to 8.2.5 and the resultant code compiles and runs. I've left in some unneeded parallel stuff (PathLocus struct), which I can weed out in a subsequent version if desired. I also removed the 'cdb ' from most of the function names and (I hope) any Greenplum copyrights. I discovered that the patch solves a slightly different problem... it pulls up index scans as a viable path choice, (but not for the DESC case) but does not push down the LIMIT to the child tables ... so the actual performance improvement is zero - however hopefully the patch provides useful raw material to help. e.g - using the examine schema from the OP email - but removing the DESC from the query: part=# set enable_seqscan=off; SET part=# explain SELECT * FROM n_traf ORDER BY date_time LIMIT 1; QUERY PLAN - Limit (cost=198367.14..198367.15 rows=1 width=20) - Sort (cost=198367.14..200870.92 rows=1001510 width=20) Sort Key: public.n_traf.date_time - Result (cost=0.00..57464.92 rows=1001510 width=20) - Append (cost=0.00..57464.92 rows=1001510 width=20) - Index Scan using n_traf_date_time_login_id on n_traf (cost=0.00..66.90 rows=1510 width=20) - Index Scan using n_traf_y2007m01_date_time_login_id on n_traf_y2007m01 n_traf (cost=0.00..4748.38 rows=83043 width=20) - Index Scan using n_traf_y2007m02_date_time_login_id on n_traf_y2007m02 n_traf (cost=0.00..4772.60 rows=83274 width=20) - Index Scan using n_traf_y2007m03_date_time_login_id on n_traf_y2007m03 n_traf (cost=0.00..4782.12 rows=83330 width=20) - Index Scan using n_traf_y2007m04_date_time_login_id on n_traf_y2007m04 n_traf (cost=0.00..4818.29 rows=83609 width=20) - Index Scan using n_traf_y2007m05_date_time_login_id on n_traf_y2007m05 n_traf (cost=0.00..4721.85 rows=82830 width=20) - Index Scan using n_traf_y2007m06_date_time_login_id on n_traf_y2007m06 n_traf (cost=0.00..4766.56 rows=83357 width=20) - Index Scan using n_traf_y2007m07_date_time_login_id on n_traf_y2007m07 n_traf (cost=0.00..4800.44 rows=83548 width=20) - Index Scan using n_traf_y2007m08_date_time_login_id on n_traf_y2007m08 n_traf (cost=0.00..4787.55 rows=83248 width=20) - Index Scan using n_traf_y2007m09_date_time_login_id on n_traf_y2007m09 n_traf (cost=0.00..4830.67 rows=83389 width=20) - Index Scan using n_traf_y2007m10_date_time_login_id on n_traf_y2007m10 n_traf (cost=0.00..4795.78 rows=82993 width=20) - Index Scan using n_traf_y2007m11_date_time_login_id on n_traf_y2007m11 n_traf (cost=0.00..4754.26 rows=83351 width=20) - Index Scan using n_traf_y2007m12_date_time_login_id on n_traf_y2007m12 n_traf (cost=0.00..4819.51 rows=84028 width=20) (18 rows) index-order-by-wip.patch.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Luke Lonergan wrote: On 11/7/07 10:21 PM, Gregory Stark [EMAIL PROTECTED] wrote: part=# explain SELECT * FROM n_traf ORDER BY date_time LIMIT 1; QUERY PLAN - Limit (cost=198367.14..198367.15 rows=1 width=20) - Sort (cost=198367.14..200870.92 rows=1001510 width=20) Sort Key: public.n_traf.date_time - Result (cost=0.00..57464.92 rows=1001510 width=20) - Append (cost=0.00..57464.92 rows=1001510 width=20) - Index Scan using n_traf_date_time_login_id on n_traf (cost=0.00..66.90 rows=1510 width=20) That looks suspicious. There's likely no good reason to be using the index scan unless it avoids the sort node above the Append node. That's what I hope to do by having the Append executor code do what's necessary to maintain the order. Yah - the way it works in GPDB is that you get a non-sorting plan with an index scan below the parent - that was the point of the fix. Hmm. Unfortunately our plan in GPDB looks exactly the same in this case - so we have a bit of work to do as well! Initially I wondered if I have got something wrong in the patch... and checked on GPDB - only to see the same behaviour! (see prev comment about LIMIT). Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
On 11/7/07 10:21 PM, Gregory Stark [EMAIL PROTECTED] wrote: part=# explain SELECT * FROM n_traf ORDER BY date_time LIMIT 1; QUERY PLAN - Limit (cost=198367.14..198367.15 rows=1 width=20) - Sort (cost=198367.14..200870.92 rows=1001510 width=20) Sort Key: public.n_traf.date_time - Result (cost=0.00..57464.92 rows=1001510 width=20) - Append (cost=0.00..57464.92 rows=1001510 width=20) - Index Scan using n_traf_date_time_login_id on n_traf (cost=0.00..66.90 rows=1510 width=20) That looks suspicious. There's likely no good reason to be using the index scan unless it avoids the sort node above the Append node. That's what I hope to do by having the Append executor code do what's necessary to maintain the order. Yah - the way it works in GPDB is that you get a non-sorting plan with an index scan below the parent - that was the point of the fix. Hmm. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Mark Kirkwood [EMAIL PROTECTED] writes: Here is a (somewhat hurried) self-contained version of the patch under discussion. It applies to 8.2.5 and the resultant code compiles and runs. I've left in some unneeded parallel stuff (PathLocus struct), which I can weed out in a subsequent version if desired. I also removed the 'cdb ' from most of the function names and (I hope) any Greenplum copyrights. Thanks, I'll take a look at it. I discovered that the patch solves a slightly different problem... it pulls up index scans as a viable path choice, (but not for the DESC case) but does not push down the LIMIT to the child tables ... so the actual performance improvement is zero - however hopefully the patch provides useful raw material to help. SET part=# explain SELECT * FROM n_traf ORDER BY date_time LIMIT 1; QUERY PLAN - Limit (cost=198367.14..198367.15 rows=1 width=20) - Sort (cost=198367.14..200870.92 rows=1001510 width=20) Sort Key: public.n_traf.date_time - Result (cost=0.00..57464.92 rows=1001510 width=20) - Append (cost=0.00..57464.92 rows=1001510 width=20) - Index Scan using n_traf_date_time_login_id on n_traf (cost=0.00..66.90 rows=1510 width=20) That looks suspicious. There's likely no good reason to be using the index scan unless it avoids the sort node above the Append node. That's what I hope to do by having the Append executor code do what's necessary to maintain the order. From skimming your patch previously I thought the main point was when there was only one subnode. In that case it was able to pull the subnode entirely out of the append node and pull up the paths of the subnode. In Postgres that would never happen because constraint exclusion will never be able to prune down to a single partition because of the parent table problem but I expect we'll change that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match