Re: [PERFORM] REPOST: Nested loops row estimates always too high
On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Sounds much like the issue I was seeing as well. Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a --- Put as a SRF left join ( select b from y where ) b --- Put as a SRF on a.a = b.a ---(end of broadcast)--- TIP 6: explain analyze is your friend
[OT] Re: [PERFORM] [Again] Postgres performance problem
On Mon, 2007-09-17 at 07:27 -0500, Decibel! wrote: Take a look at the stuff at http://decibel.org/~decibel/pervasive/, it'd hopefully provide a useful starting point. A bit offtrack, but I was reading the articles and noticed this in the bottom. Is this a typo or ... Making PostreSQL pervasive© 2005 Pervasive Software Inc ^ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Searching for the cause of a bad plan
On Fri, 2007-09-21 at 19:30 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: That's not my perspective. If the LIMIT had been applied accurately to the cost then the hashjoin would never even have been close to the nested join in the first place. [ shrug... ] Your perspective is mistaken. There is nothing wrong with the way the LIMIT estimation is being done. The plan in question was Limit (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1) - Nested Loop (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1) - Index Scan using pk_table_a on table_a ta (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1) Index Cond: (a = $1) - Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 21.876 ms and there are two fairly serious estimation errors here, neither related at all to the LIMIT: * five-orders-of-magnitude overestimate of the number of table_a rows that will match the condition on a; I don't see any problem with this estimate, but I do now agree there is a problem with the other estimate. We check to see if the value is an MFV, else we assume that the distribution is uniformly distributed across the histogram bucket. Csaba provided details of the fairly shallow distribution of values of a in table_a. 96% of rows aren't covered by the MFVs, so its a much shallower distribution than is typical, but still easily possible. So based upon what we know there should be ~330,000 rows with the value of a used for the EXPLAIN. So it looks to me like we did the best we could with the available information, so I can't see that as a planner problem per se. We cannot do better a priori without risking worse plans in other circumstances. * enormous underestimate of the number of join rows --- it's apparently thinking only 0.0001 of the table_a rows will have a join partner, whereas at least for this case they all do. OK, I agree this estimate does have a problem and it has nothing to do with LIMIT. Looking at the code I can't see how this selectivity can have been calculated. AFAICS eqjoinsel() gives a selectivity of 1.0 using the data supplied by Csaba and it ought to cover this case reasonably well. Csaba, please can you copy that data into fresh tables, re-ANALYZE and then re-post the EXPLAINs, with stats data. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Low CPU Usage
Hi Greg this is my Bonnie result. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP insaubi 8G 25893 54 26762 9 14146 3 36846 68 43502 3 102.8 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ insaubi,8G,25893,54,26762,9,14146,3,36846,68,43502,3,102.8,0,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++ If I compare this against my laptop (SATA disk too) is really better, but I don't know if this result is a good one or not. I don't know where to continue looking for the cause of the problem, I think there is a bug or something missconfigured with Debian 4.0r1 and Postgres. I unppluged the server from the network with the same results. I have the server mapped as localhost in PgAdmin III, there shouldn't be network traffic and there isn't (monitoring the network interface). I'm really lost with this weird behaviour. I really apreciate your help Regards Agustin - Mensaje original De: Greg Smith [EMAIL PROTECTED] Para: [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Enviado: sábado 22 de septiembre de 2007, 3:29:17 Asunto: Re: [PERFORM] Low CPU Usage On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote: Which other test can I do to find if this is a hardware, kernel o postgres issue? The little test hdparm does is not exactly a robust hard drive benchmark. If you want to rule out hard drive transfer speed issues, take at look at the tests suggested at http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and see how your results compare to the single SATA disk example I give there. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! Deportes. ¡Conocelas! http://ar.sports.yahoo.com/mundialderugby
Re: [PERFORM] Searching for the cause of a bad plan
On Mon, 2007-09-24 at 14:27 +0100, Simon Riggs wrote: Csaba, please can you copy that data into fresh tables, re-ANALYZE and then re-post the EXPLAINs, with stats data. Well, I can of course. I actually tried to generate some random data with similar record count and relations between the tables (which I'm not sure I succeeded at), without the extra columns, but it was happily yielding the nested loop plan. So I guess I really have to copy the whole data (several tens of GB). But from my very limited understanding of what information is available for the planner, I thought that the record count estimated for the join between table_a and table_b1 on column b should be something like (estimated record count in table_a for value a) * (weight of b range covered by table_b1 and table_a in common) / (weight of b range covered by table_a) This is if the b values in table_a wouldn't be correlated at all with the content of table_b2. The reality is that they are, but the planner has no information about that. I have no idea how the planner works though, so this might be totally off... I will copy the data and send the results (not promising though that it will be today). Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] select count(*) performance (vacuum did not help)
hi, i have the following table: CREATE TABLE main_activity ( id serial NOT NULL PRIMARY KEY, user_id integer NOT NULL, sessionid varchar(128) NOT NULL, login timestamp with time zone NOT NULL, activity timestamp with time zone NOT NULL, logout timestamp with time zone NULL ) the problem is that it contains around 2 entries, and a select count(*) takes around 2 minutes. that's too slow. some background info: - this table has a lot of updates and inserts, it works very similarly to a session-table for a web-application - there is a cron-job that deletes all the old entries, so it's size is rougly between 15000 and 35000 entries (it's run daily, and every day deletes around 1 entries) - but in the past, the cron-job was not in place, so the table's size grew to around 80 entries (in around 80 days) - then we removed the old entries, added the cronjob, vacuumed + analyzed the table, and the count(*) is still slow - the output of the vacuum+analyze is: INFO: vacuuming public.main_activity INFO: index main_activity_pkey now contains 11675 row versions in 57301 pages DETAIL: 41001 index row versions were removed. 56521 index pages have been deleted, 2 are currently reusable. CPU 1.03s/0.27u sec elapsed 56.08 sec. INFO: index main_activity_user_id now contains 11679 row versions in 41017 pages DETAIL: 41001 index row versions were removed. 37736 index pages have been deleted, 2 are currently reusable. CPU 0.70s/0.42u sec elapsed 62.04 sec. INFO: main_activity: removed 41001 row versions in 4310 pages DETAIL: CPU 0.15s/0.37u sec elapsed 20.48 sec. INFO: main_activity: found 41001 removable, 11672 nonremovable row versions in 160888 pages DETAIL: 0 dead row versions cannot be removed yet. There were 14029978 unused item pointers. 0 pages are entirely empty. CPU 5.53s/1.71u sec elapsed 227.35 sec. INFO: analyzing public.main_activity INFO: main_activity: 160888 pages, 4500 rows sampled, 4594 estimated total rows (please note that the 4594 estimated total rows... the row-count should be around 15000) - this is on postgresql 7.4.8 .yes, i know it's too old, and currently we are preparing a migration to postgres8.1 (or 8.2, i'm not sure yet), but for now i have to solve the problem on this database thanks a lot, gabor ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Low CPU Usage
I have found the reason!!! I begin to see line by line postgresql.conf and saw ssl = true. I have disabled ssl and then I have restarted the server and that's all. It's 4 or 5 times faster than the old server. I don't understand why PgAdmin is connecting using ssl if I have leave this field empty!!! Debian by default installs Postgres with ssl enabled. Thank you very much all of you to help me to find the causes. Regards Agustin - Mensaje original De: [EMAIL PROTECTED] [EMAIL PROTECTED] Para: Greg Smith [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Enviado: lunes 24 de septiembre de 2007, 10:59:26 Asunto: Re: [PERFORM] Low CPU Usage Hi Greg this is my Bonnie result. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP insaubi 8G 25893 54 26762 9 14146 3 36846 68 43502 3 102.8 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ insaubi,8G,25893,54,26762,9,14146,3,36846,68,43502,3,102.8,0,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++ If I compare this against my laptop (SATA disk too) is really better, but I don't know if this result is a good one or not. I don't know where to continue looking for the cause of the problem, I think there is a bug or something missconfigured with Debian 4.0r1 and Postgres. I unppluged the server from the network with the same results. I have the server mapped as localhost in PgAdmin III, there shouldn't be network traffic and there isn't (monitoring the network interface). I'm really lost with this weird behaviour. I really apreciate your help Regards Agustin - Mensaje original De: Greg Smith [EMAIL PROTECTED] Para: [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Enviado: sábado 22 de septiembre de 2007, 3:29:17 Asunto: Re: [PERFORM] Low CPU Usage On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote: Which other test can I do to find if this is a hardware, kernel o postgres issue? The little test hdparm does is not exactly a robust hard drive benchmark. If you want to rule out hard drive transfer speed issues, take at look at the tests suggested at http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and see how your results compare to the single SATA disk example I give there. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD El Mundial de Rugby 2007 Las últimas noticias en Yahoo! Deportes: http://ar.sports.yahoo.com/mundialderugby Los referentes más importantes en compra/ venta de autos se juntaron: Demotores y Yahoo! Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/
Re: [PERFORM] select count(*) performance (vacuum did not help)
On 9/24/07, Gábor Farkas [EMAIL PROTECTED] wrote: INFO: main_activity: found 41001 removable, 11672 nonremovable row versions in 160888 pages DETAIL: 0 dead row versions cannot be removed yet. There were 14029978 unused item pointers. 0 pages are entirely empty. CPU 5.53s/1.71u sec elapsed 227.35 sec. INFO: analyzing public.main_activity INFO: main_activity: 160888 pages, 4500 rows sampled, 4594 estimated total rows Looking at the number of rows vs number of pages, ISTM that VACUUM FULL should help you. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [PERFORM] select count(*) performance (vacuum did not help)
Gábor Farkas wrote: - this table has a lot of updates and inserts, it works very similarly to a session-table for a web-application Make sure you run VACUUM often enough. - there is a cron-job that deletes all the old entries, so it's size is rougly between 15000 and 35000 entries (it's run daily, and every day deletes around 1 entries) Running vacuum after these deletes to immediately reclaim the dead space would also be a good idea. - but in the past, the cron-job was not in place, so the table's size grew to around 80 entries (in around 80 days) That bloated your table, so that there's still a lot of empty pages in it. VACUUM FULL should bring it back to a reasonable size. Regular normal non-FULL VACUUMs should keep it in shape after that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] select count(*) performance (vacuum did not help)
Heikki Linnakangas wrote: Gábor Farkas wrote: - but in the past, the cron-job was not in place, so the table's size grew to around 80 entries (in around 80 days) That bloated your table, so that there's still a lot of empty pages in it. VACUUM FULL should bring it back to a reasonable size. Regular normal non-FULL VACUUMs should keep it in shape after that. hmm... can a full-vacuum be performed while the database is still live (i mean serving requests)? will the db still be able to respond to queries? or in a different way: if i do a full vacuum to that table only, will the database still serve data from the other tables at a normal speed? thanks, gabor ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] select count(*) performance (vacuum did not help)
Gábor Farkas wrote: hmm... can a full-vacuum be performed while the database is still live (i mean serving requests)? will the db still be able to respond to queries? VACUUM FULL will exclusive lock the table, which means that other queries accessing it will block and wait until it's finished. or in a different way: if i do a full vacuum to that table only, will the database still serve data from the other tables at a normal speed? Yes. The extra I/O load vacuum full generates while it's running might disrupt other activity, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] select count(*) performance (vacuum did not help)
Heikki Linnakangas wrote: Gábor Farkas wrote: if i do a full vacuum to that table only, will the database still serve data from the other tables at a normal speed? Yes. The extra I/O load vacuum full generates while it's running might disrupt other activity, though. i see. will i achieve the same thing by simply dropping that table and re-creating it? gabor ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] TEXT or LONGTEXT?
Hello, I have a database with an amount of tables and in several of them I have an attribute for a semantic definition, for which I use a field of type text. I am trying to decide if it would be worth using LONGTEXT instead of TEXT, as maybe it would slow down the data insertion and extraction. I hope that you could help me. Thank you. -- Fabiola Fernández Gutiérrez Grupo de Ingeniería Biomédica Escuela Superior de Ingeniería Camino de los Descubrimientos, s/n Isla de la Cartuja 41092 Sevilla (Spain) Tfno: +34 954487399 E-mail: [EMAIL PROTECTED]
Re: [PERFORM] select count(*) performance (vacuum did not help)
In response to Gábor Farkas [EMAIL PROTECTED]: Heikki Linnakangas wrote: Gábor Farkas wrote: if i do a full vacuum to that table only, will the database still serve data from the other tables at a normal speed? Yes. The extra I/O load vacuum full generates while it's running might disrupt other activity, though. i see. will i achieve the same thing by simply dropping that table and re-creating it? Yes. Once you've done so, keep up the vacuum schedule you've already established. You may want to (as has already been suggested) explicitly vacuum this table after large delete operations as well. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] TEXT or LONGTEXT?
On 9/24/07, Fabiola Fernández [EMAIL PROTECTED] wrote: I have a database with an amount of tables and in several of them I have an attribute for a semantic definition, for which I use a field of type text. I am trying to decide if it would be worth using LONGTEXT instead of TEXT, as maybe it would slow down the data insertion and extraction. I hope that you could help me. Thank you. Easy choice -- PostgreSQL does not have a data type named longtext. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] select count(*) performance (vacuum did not help)
On Mon, 2007-09-24 at 17:14 +0200, Gábor Farkas wrote: will i achieve the same thing by simply dropping that table and re-creating it? If you have an index/PK on that table, the fastest and most useful way to rebuild it is to do CLUSTER on that index. That will be a lot faster than VACUUM FULL and it will also order your table in index order... but it will also lock it in exclusive mode just as VACUUM FULL would do it. If your table has just a few live rows and lots of junk in it, CLUSTER should be fast enough. With 20K entries I would expect it to be fast enough not to be a problem... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] select count(*) performance (vacuum did not help)
-Original Message- From: Gábor Farkas i see. will i achieve the same thing by simply dropping that table and re-creating it? Yes. Or even easier (if you don't need the data anymore) you can use the truncate command. Which deletes everything in the table including dead rows. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] TEXT or LONGTEXT?
On 24 sep 2007, at 17.21, Fabiola Fernández wrote: I am trying to decide if it would be worth using LONGTEXT instead of TEXT, as maybe it would slow down the data insertion and extraction. Postgres doesn't have a LONGTEXT datatype, so keep using TEXT. http://www.postgresql.org/docs/8.2/interactive/datatype-character.html Sincerely, Niklas Johansson ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Searching for the cause of a bad plan
On Mon, 2007-09-24 at 16:04 +0200, Csaba Nagy wrote: On Mon, 2007-09-24 at 14:27 +0100, Simon Riggs wrote: Csaba, please can you copy that data into fresh tables, re-ANALYZE and then re-post the EXPLAINs, with stats data. Well, I can of course. I actually tried to generate some random data with similar record count and relations between the tables (which I'm not sure I succeeded at), without the extra columns, but it was happily yielding the nested loop plan. So I guess I really have to copy the whole data (several tens of GB). But from my very limited understanding of what information is available for the planner, I thought that the record count estimated for the join between table_a and table_b1 on column b should be something like (estimated record count in table_a for value a) * (weight of b range covered by table_b1 and table_a in common) / (weight of b range covered by table_a) There's no such code I'm aware of. Sounds a good idea though. I'm sure we could do something with the histogram values, but we don't in the default selectivity functions. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] REPOST: Nested loops row estimates always too high
Has anyone offered any answers to you? No one else has replied to this post. Ow Mun Heng [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Sounds much like the issue I was seeing as well. Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a --- Put as a SRF left join ( select b from y where ) b --- Put as a SRF on a.a = b.a ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Acceptable level of over-estimation?
Is there a rule of thumb about when the planner's row estimates are too high? In particular, when should I be concerned that planner's estimated number of rows estimated for a nested loop is off? By a factor of 10? 100? 1000? Carlo ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Acceptable level of over-estimation?
Carlo Stonebanks [EMAIL PROTECTED] writes: Is there a rule of thumb about when the planner's row estimates are too high? In particular, when should I be concerned that planner's estimated number of rows estimated for a nested loop is off? By a factor of 10? 100? 1000? Not really. It's a big enough difference for the planner to make a bad decision or it isn't. But if you pressed me I would say a factor of 10 is bad. A factor of 2 is inevitable in some cases. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] REPOST: Nested loops row estimates always too high
On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: Has anyone offered any answers to you? No one else has replied to this post. Overestimate of selectivity. I guess it's mainly due to my one to many table relationships. I've tried everything from concatenated join columns and indexing it to creating all sorts of indexes and splitting the (1) tables into multiple tables and upping the indexes to 1000 and turning of nestloops/enabling geqo/ tweaking the threshold/effort and much much more (as much as I was asked to/suggested to) but still no luck. In my case, the individual queries were fast. So, In then end, I made a SRF and used the SRFs to join each other. This worked better. Ow Mun Heng [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Sounds much like the issue I was seeing as well. Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a --- Put as a SRF left join ( select b from y where ) b --- Put as a SRF on a.a = b.a ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] REPOST: Nested loops row estimates always too high
My problem is that I think that SRF's are causing my problems. The SRF's gets an automatic row estimate of 1000 rows. Add a condition to it, the planner guesses 333 rows. Even at 333, this is an overestimate of the number of rows returned. I'm really disappointed - SRF's are a great way to place the enterprise's db-centric business logic at the server. Carlo -Original Message- From: Ow Mun Heng [mailto:[EMAIL PROTECTED] Sent: September 24, 2007 8:51 PM To: Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: Has anyone offered any answers to you? No one else has replied to this post. Overestimate of selectivity. I guess it's mainly due to my one to many table relationships. I've tried everything from concatenated join columns and indexing it to creating all sorts of indexes and splitting the (1) tables into multiple tables and upping the indexes to 1000 and turning of nestloops/enabling geqo/ tweaking the threshold/effort and much much more (as much as I was asked to/suggested to) but still no luck. In my case, the individual queries were fast. So, In then end, I made a SRF and used the SRFs to join each other. This worked better. Ow Mun Heng [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Sounds much like the issue I was seeing as well. Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a --- Put as a SRF left join ( select b from y where ) b --- Put as a SRF on a.a = b.a ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] REPOST: Nested loops row estimates always too high
On Tue, 2007-09-25 at 00:53 -0400, Carlo Stonebanks wrote: My problem is that I think that SRF's are causing my problems. The SRF's gets an automatic row estimate of 1000 rows. That's correct. That's what I see too though I may return 10K rows of data. (min 10 columns) But It's way faster than the normal joins I do. I'm really disappointed - SRF's are a great way to place the enterprise's db-centric business logic at the server. Actually, I think in general, nested Loops, while evil, are just going to be around. Even in MSSQL, when I'm pulling from, the nested loops are many and I presume it's cos of the 8x SMP and the multiGB ram which is making the query better. Carlo -Original Message- From: Ow Mun Heng [mailto:[EMAIL PROTECTED] Sent: September 24, 2007 8:51 PM To: Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote: Has anyone offered any answers to you? No one else has replied to this post. Overestimate of selectivity. I guess it's mainly due to my one to many table relationships. I've tried everything from concatenated join columns and indexing it to creating all sorts of indexes and splitting the (1) tables into multiple tables and upping the indexes to 1000 and turning of nestloops/enabling geqo/ tweaking the threshold/effort and much much more (as much as I was asked to/suggested to) but still no luck. In my case, the individual queries were fast. So, In then end, I made a SRF and used the SRFs to join each other. This worked better. Ow Mun Heng [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote: (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE) I am noticing that my queries are spending a lot of time in nested loops. The table/index row estimates are not bad, but the nested loops can be off by a factor of 50. In any case, they are always too high. Are the over-estimations below significant, and if so, is this an indication of a general configuration problem? Sounds much like the issue I was seeing as well. Unique (cost=67605.91..67653.18 rows=4727 width=16) (actual time=8634.618..8637.918 rows=907 loops=1) You can to rewrite the queries to individual queries to see it if helps. In my case, I was doing select a.a,b.b,c.c from (select a from x where) a --- Put as a SRF left join ( select b from y where ) b --- Put as a SRF on a.a = b.a ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 6: explain analyze is your friend