Re: [PERFORM] REPOST: Nested loops row estimates always too high

2007-09-24 Thread Ow Mun Heng
On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
 (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS HERE)
 
 I am noticing that my queries are spending a lot of time in nested loops. 
 The table/index row estimates are not bad, but the nested loops can be off 
 by a factor of 50. In any case, they are always too high.
 
 Are the over-estimations below significant, and if so, is this an indication 
 of a general configuration problem?
Sounds much like the issue I was seeing as well.

 
 Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual 
 time=8634.618..8637.918 rows=907 loops=1)

You can to rewrite the queries to individual queries to see it if helps.

In my case, I was doing

select a.a,b.b,c.c from 
(select a from x where) a --- Put as a SRF
left join (
select b from y where ) b --- Put as a SRF
on a.a = b.a




---(end of broadcast)---
TIP 6: explain analyze is your friend


[OT] Re: [PERFORM] [Again] Postgres performance problem

2007-09-24 Thread Ow Mun Heng
On Mon, 2007-09-17 at 07:27 -0500, Decibel! wrote:

 Take a look at the stuff at http://decibel.org/~decibel/pervasive/, it'd
 hopefully provide a useful starting point.


A bit offtrack, but I was reading the articles and noticed this in the
bottom. Is this a typo or ...


Making PostreSQL pervasive© 2005 Pervasive Software Inc
   ^



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-24 Thread Simon Riggs
On Fri, 2007-09-21 at 19:30 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  That's not my perspective. If the LIMIT had been applied accurately to
  the cost then the hashjoin would never even have been close to the
  nested join in the first place.
 
 [ shrug... ]  Your perspective is mistaken.  There is nothing wrong with
 the way the LIMIT estimation is being done.  The plan in question was
 
 Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 
 rows=2 loops=1)
-  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual 
 time=17.729..21.671 rows=2 loops=1)
  -  Index Scan using pk_table_a on table_a ta  (cost=0.00..324880.88 
 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1)
Index Cond: (a = $1)
  -  Index Scan using pk_table_b2 on table_b2 tb  (cost=0.00..3.77 
 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2)
Index Cond: (ta.b = tb.b)
  Total runtime: 21.876 ms
 
 and there are two fairly serious estimation errors here, neither related
 at all to the LIMIT:
 
 * five-orders-of-magnitude overestimate of the number of table_a rows
 that will match the condition on a;

I don't see any problem with this estimate, but I do now agree there is
a problem with the other estimate.

We check to see if the value is an MFV, else we assume that the
distribution is uniformly distributed across the histogram bucket. 

Csaba provided details of the fairly shallow distribution of values of a
in table_a. 96% of rows aren't covered by the MFVs, so its a much
shallower distribution than is typical, but still easily possible. So
based upon what we know there should be ~330,000 rows with the value of
a used for the EXPLAIN.

So it looks to me like we did the best we could with the available
information, so I can't see that as a planner problem per se. We cannot
do better a priori without risking worse plans in other circumstances.

 * enormous underestimate of the number of join rows --- it's apparently
 thinking only 0.0001 of the table_a rows will have a join partner,
 whereas at least for this case they all do.

OK, I agree this estimate does have a problem and it has nothing to do
with LIMIT.

Looking at the code I can't see how this selectivity can have been
calculated. AFAICS eqjoinsel() gives a selectivity of 1.0 using the data
supplied by Csaba and it ought to cover this case reasonably well.

Csaba, please can you copy that data into fresh tables, re-ANALYZE and
then re-post the EXPLAINs, with stats data.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Low CPU Usage

2007-09-24 Thread brauagustin-susc
Hi Greg this is my Bonnie result.

Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
insaubi  8G 25893  54 26762   9 14146   3 36846  68 43502   3 102.8   0
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
insaubi,8G,25893,54,26762,9,14146,3,36846,68,43502,3,102.8,0,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++

If I compare this against my laptop (SATA disk too) is really better, but I 
don't know if this result is a good one or not.
I don't know where to continue looking for the cause of the problem, I think 
there is a bug or something missconfigured with Debian 4.0r1 and Postgres.
I unppluged the server from the network with the same results. I have the 
server mapped as localhost in PgAdmin III, there shouldn't be network traffic 
and there isn't (monitoring the network interface). I'm really lost with this 
weird behaviour.
I really apreciate your help
Regards
Agustin

- Mensaje original 
De: Greg Smith [EMAIL PROTECTED]
Para: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Enviado: sábado 22 de septiembre de 2007, 3:29:17
Asunto: Re: [PERFORM] Low CPU Usage

On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote:

 Which other test can I do to find if this is a hardware, kernel o 
 postgres issue?

The little test hdparm does is not exactly a robust hard drive benchmark. 
If you want to rule out hard drive transfer speed issues, take at look at 
the tests suggested at 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and 
see how your results compare to the single SATA disk example I give there.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD







  Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! 
Deportes. ¡Conocelas!
http://ar.sports.yahoo.com/mundialderugby

Re: [PERFORM] Searching for the cause of a bad plan

2007-09-24 Thread Csaba Nagy
On Mon, 2007-09-24 at 14:27 +0100, Simon Riggs wrote:
 Csaba, please can you copy that data into fresh tables, re-ANALYZE and
 then re-post the EXPLAINs, with stats data.

Well, I can of course. I actually tried to generate some random data
with similar record count and relations between the tables (which I'm
not sure I succeeded at), without the extra columns, but it was happily
yielding the nested loop plan. So I guess I really have to copy the
whole data (several tens of GB).

But from my very limited understanding of what information is available
for the planner, I thought that the record count estimated for the join
between table_a and table_b1 on column b should be something like

(estimated record count in table_a for value a) * (weight of b range
covered by table_b1 and table_a in common) / (weight of b range
covered by table_a)

This is if the b values in table_a wouldn't be correlated at all with
the content of table_b2. The reality is that they are, but the planner
has no information about that.

I have no idea how the planner works though, so this might be totally
off...

I will copy the data and send the results (not promising though that it
will be today).

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Gábor Farkas

hi,

i have the following table:

CREATE TABLE main_activity (
id serial NOT NULL PRIMARY KEY,
user_id integer NOT NULL,
sessionid varchar(128) NOT NULL,
login timestamp with time zone NOT NULL,
activity timestamp with time zone NOT NULL,
logout timestamp with time zone NULL
)

the problem is that it contains around 2 entries,  and a select 
count(*) takes around 2 minutes. that's too slow.


some background info:

- this table has a lot of updates and inserts, it works very similarly 
to a session-table for a web-application


- there is a cron-job that deletes all the old entries, so it's size is 
rougly between 15000 and 35000 entries (it's run daily, and every day

deletes around 1 entries)

- but in the past, the cron-job was not in place, so the table's size 
grew to around 80 entries (in around 80 days)


- then we removed the old entries, added the cronjob, vacuumed + 
analyzed the table, and the count(*) is still slow


- the output of the vacuum+analyze is:

INFO:  vacuuming public.main_activity
INFO:  index main_activity_pkey now contains 11675 row versions in 
57301 pages

DETAIL:  41001 index row versions were removed.
56521 index pages have been deleted, 2 are currently reusable.
CPU 1.03s/0.27u sec elapsed 56.08 sec.
INFO:  index main_activity_user_id now contains 11679 row versions in 
41017 pages

DETAIL:  41001 index row versions were removed.
37736 index pages have been deleted, 2 are currently reusable.
CPU 0.70s/0.42u sec elapsed 62.04 sec.
INFO:  main_activity: removed 41001 row versions in 4310 pages
DETAIL:  CPU 0.15s/0.37u sec elapsed 20.48 sec.
INFO:  main_activity: found 41001 removable, 11672 nonremovable row 
versions in 160888 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 14029978 unused item pointers.
0 pages are entirely empty.
CPU 5.53s/1.71u sec elapsed 227.35 sec.
INFO:  analyzing public.main_activity
INFO:  main_activity: 160888 pages, 4500 rows sampled, 4594 estimated 
total rows


(please note that the 4594 estimated total rows... the row-count 
should be around 15000)


- this is on postgresql 7.4.8 .yes, i know it's too old, and currently 
we are preparing a migration to postgres8.1 (or 8.2, i'm not sure yet),

but for now i have to solve the problem on this database

thanks a lot,

gabor

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Low CPU Usage

2007-09-24 Thread brauagustin-susc
I have found the reason!!! I begin to see line by line postgresql.conf and saw 
ssl = true.
I have disabled ssl and then I have restarted the server and that's all. It's 4 
or 5 times faster than the old server.
I don't understand why PgAdmin is connecting using ssl if I have leave this 
field empty!!!
Debian by default installs Postgres with ssl enabled.
Thank you very much all of you to help me to find the causes.
Regards
Agustin

- Mensaje original 
De: [EMAIL PROTECTED] [EMAIL PROTECTED]
Para: Greg Smith [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Enviado: lunes 24 de septiembre de 2007, 10:59:26
Asunto: Re: [PERFORM] Low CPU Usage

Hi Greg this is my Bonnie result.

Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
insaubi  8G 25893  54 26762   9 14146   3 36846  68 43502   3 102.8  
 0
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
insaubi,8G,25893,54,26762,9,14146,3,36846,68,43502,3,102.8,0,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++

If I compare this against my laptop (SATA disk too) is really better, but I 
don't know if this result is a good one or not.
I don't
 know where to continue looking for the cause of the problem, I think there is 
a bug or something missconfigured with Debian 4.0r1 and Postgres.
I unppluged the server from the network with the same results. I have the 
server mapped as localhost in PgAdmin III, there shouldn't be network traffic 
and there isn't (monitoring the network interface). I'm really lost with this 
weird behaviour.
I really apreciate your help
Regards
Agustin

- Mensaje original 
De: Greg Smith [EMAIL PROTECTED]
Para: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Enviado: sábado 22 de septiembre de 2007, 3:29:17
Asunto: Re: [PERFORM] Low CPU Usage

On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote:

 Which other test can I do to find if this is a hardware, kernel o 
 postgres issue?

The
 little test hdparm does is not exactly a robust hard drive benchmark. 
If you want to rule out hard drive transfer speed issues, take at look at 
the tests suggested at 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and 
see how your results compare to the single SATA disk example I give there.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD







  
El Mundial de Rugby 2007
Las últimas noticias en Yahoo! Deportes:

http://ar.sports.yahoo.com/mundialderugby





  Los referentes más importantes en compra/ venta de autos se juntaron:
Demotores y Yahoo!
Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/

Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Pavan Deolasee
On 9/24/07, Gábor Farkas [EMAIL PROTECTED] wrote:


 INFO:  main_activity: found 41001 removable, 11672 nonremovable row
 versions in 160888 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 14029978 unused item pointers.
 0 pages are entirely empty.
 CPU 5.53s/1.71u sec elapsed 227.35 sec.
 INFO:  analyzing public.main_activity
 INFO:  main_activity: 160888 pages, 4500 rows sampled, 4594 estimated
 total rows


Looking at the number of rows vs number of pages, ISTM that VACUUM FULL
should help you.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Heikki Linnakangas
Gábor Farkas wrote:
 - this table has a lot of updates and inserts, it works very similarly
 to a session-table for a web-application

Make sure you run VACUUM often enough.

 - there is a cron-job that deletes all the old entries, so it's size is
 rougly between 15000 and 35000 entries (it's run daily, and every day
 deletes around 1 entries)

Running vacuum after these deletes to immediately reclaim the dead space
would also be a good idea.

 - but in the past, the cron-job was not in place, so the table's size
 grew to around 80 entries (in around 80 days)

That bloated your table, so that there's still a lot of empty pages in
it. VACUUM FULL should bring it back to a reasonable size. Regular
normal non-FULL VACUUMs should keep it in shape after that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Gábor Farkas

Heikki Linnakangas wrote:

Gábor Farkas wrote:

- but in the past, the cron-job was not in place, so the table's size
grew to around 80 entries (in around 80 days)


That bloated your table, so that there's still a lot of empty pages in
it. VACUUM FULL should bring it back to a reasonable size. Regular
normal non-FULL VACUUMs should keep it in shape after that.



hmm... can a full-vacuum be performed while the database is still live 
(i mean serving requests)?


will the db still be able to respond to queries?

or in a different way:

if i do a full vacuum to that table only, will the database still serve 
data from the other tables at a normal speed?


thanks,
gabor

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Heikki Linnakangas
Gábor Farkas wrote:
 hmm... can a full-vacuum be performed while the database is still live
 (i mean serving requests)?
 
 will the db still be able to respond to queries?

VACUUM FULL will exclusive lock the table, which means that other
queries accessing it will block and wait until it's finished.

 or in a different way:
 
 if i do a full vacuum to that table only, will the database still serve
 data from the other tables at a normal speed?

Yes. The extra I/O load vacuum full generates while it's running might
disrupt other activity, though.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Gábor Farkas

Heikki Linnakangas wrote:

Gábor Farkas wrote:


if i do a full vacuum to that table only, will the database still serve
data from the other tables at a normal speed?


Yes. The extra I/O load vacuum full generates while it's running might
disrupt other activity, though.



i see.

will i achieve the same thing by simply dropping that table and 
re-creating it?


gabor

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] TEXT or LONGTEXT?

2007-09-24 Thread Fabiola Fernández
Hello,

I have a database with an amount of tables and in several of them I have an
attribute for a semantic definition, for which I use a field of type text. I
am trying to decide if it would be worth using LONGTEXT instead of TEXT, as
maybe it would slow down the data insertion and extraction. I hope that you
could help me. Thank you.

-- 
Fabiola Fernández Gutiérrez
Grupo de Ingeniería Biomédica
Escuela Superior de Ingeniería
Camino de los Descubrimientos, s/n
Isla de la Cartuja
41092 Sevilla (Spain)
Tfno: +34 954487399
E-mail: [EMAIL PROTECTED]


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Bill Moran
In response to Gábor Farkas [EMAIL PROTECTED]:

 Heikki Linnakangas wrote:
  Gábor Farkas wrote:
 
  if i do a full vacuum to that table only, will the database still serve
  data from the other tables at a normal speed?
  
  Yes. The extra I/O load vacuum full generates while it's running might
  disrupt other activity, though.
  
 
 i see.
 
 will i achieve the same thing by simply dropping that table and 
 re-creating it?

Yes.  Once you've done so, keep up the vacuum schedule you've already
established.  You may want to (as has already been suggested) explicitly
vacuum this table after large delete operations as well.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] TEXT or LONGTEXT?

2007-09-24 Thread Alexander Staubo
On 9/24/07, Fabiola Fernández [EMAIL PROTECTED] wrote:
 I have a database with an amount of tables and in several of them I have an
 attribute for a semantic definition, for which I use a field of type text. I
 am trying to decide if it would be worth using LONGTEXT instead of TEXT, as
 maybe it would slow down the data insertion and extraction. I hope that you
 could help me. Thank you.

Easy choice -- PostgreSQL does not have a data type named longtext.

Alexander.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Csaba Nagy
On Mon, 2007-09-24 at 17:14 +0200, Gábor Farkas wrote:
 will i achieve the same thing by simply dropping that table and 
 re-creating it?

If you have an index/PK on that table, the fastest and most useful way
to rebuild it is to do CLUSTER on that index. That will be a lot faster
than VACUUM FULL and it will also order your table in index order... but
it will also lock it in exclusive mode just as VACUUM FULL would do it.
If your table has just a few live rows and lots of junk in it, CLUSTER
should be fast enough. With 20K entries I would expect it to be fast
enough not to be a problem...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Dave Dutcher
 -Original Message-
 From: Gábor Farkas
 
 
 i see.
 
 will i achieve the same thing by simply dropping that table 
 and re-creating it?

Yes.  Or even easier (if you don't need the data anymore) you can use the
truncate command.  Which deletes everything in the table including dead
rows.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TEXT or LONGTEXT?

2007-09-24 Thread Niklas Johansson


On 24 sep 2007, at 17.21, Fabiola Fernández wrote:
I am trying to decide if it would be worth using LONGTEXT instead  
of TEXT, as maybe it would slow down the data insertion and  
extraction.


Postgres doesn't have a LONGTEXT datatype, so keep using TEXT.

http://www.postgresql.org/docs/8.2/interactive/datatype-character.html



Sincerely,

Niklas Johansson




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-24 Thread Simon Riggs
On Mon, 2007-09-24 at 16:04 +0200, Csaba Nagy wrote:
 On Mon, 2007-09-24 at 14:27 +0100, Simon Riggs wrote:
  Csaba, please can you copy that data into fresh tables, re-ANALYZE and
  then re-post the EXPLAINs, with stats data.
 
 Well, I can of course. I actually tried to generate some random data
 with similar record count and relations between the tables (which I'm
 not sure I succeeded at), without the extra columns, but it was happily
 yielding the nested loop plan. So I guess I really have to copy the
 whole data (several tens of GB).
 
 But from my very limited understanding of what information is available
 for the planner, I thought that the record count estimated for the join
 between table_a and table_b1 on column b should be something like
 
 (estimated record count in table_a for value a) * (weight of b range
 covered by table_b1 and table_a in common) / (weight of b range
 covered by table_a)

There's no such code I'm aware of. Sounds a good idea though. I'm sure
we could do something with the histogram values, but we don't in the
default selectivity functions.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] REPOST: Nested loops row estimates always too high

2007-09-24 Thread Carlo Stonebanks

Has anyone offered any answers to you? No one else has replied to this post.


Ow Mun Heng [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
(SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS 
HERE)


I am noticing that my queries are spending a lot of time in nested loops.
The table/index row estimates are not bad, but the nested loops can be 
off

by a factor of 50. In any case, they are always too high.

Are the over-estimations below significant, and if so, is this an 
indication

of a general configuration problem?

Sounds much like the issue I was seeing as well.



Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
time=8634.618..8637.918 rows=907 loops=1)


You can to rewrite the queries to individual queries to see it if helps.

In my case, I was doing

select a.a,b.b,c.c from
(select a from x where) a --- Put as a SRF
left join (
select b from y where ) b --- Put as a SRF
on a.a = b.a




---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Acceptable level of over-estimation?

2007-09-24 Thread Carlo Stonebanks
Is there a rule of thumb about when the planner's row estimates are too 
high? In particular, when should I be concerned that planner's estimated 
number of rows estimated for a nested loop is off? By a factor of 10? 100? 
1000?


Carlo 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Acceptable level of over-estimation?

2007-09-24 Thread Gregory Stark
Carlo Stonebanks [EMAIL PROTECTED] writes:

 Is there a rule of thumb about when the planner's row estimates are too high?
 In particular, when should I be concerned that planner's estimated number of
 rows estimated for a nested loop is off? By a factor of 10? 100? 1000?

Not really. It's a big enough difference for the planner to make a bad
decision or it isn't. But if you pressed me I would say a factor of 10 is bad.
A factor of 2 is inevitable in some cases.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] REPOST: Nested loops row estimates always too high

2007-09-24 Thread Ow Mun Heng
On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
 Has anyone offered any answers to you? No one else has replied to this post.

Overestimate of selectivity. I guess it's mainly due to my one to many
table relationships. I've tried everything from concatenated join
columns and indexing it to creating all sorts of indexes and splitting
the (1) tables into multiple tables and upping the indexes to 1000 and
turning of nestloops/enabling geqo/ tweaking the threshold/effort and
much much more (as much as I was asked to/suggested to) but still no
luck.

In my case, the individual queries were fast. So, In then end, I made a
SRF and used the SRFs to join each other. This worked better.


 
 
 Ow Mun Heng [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
  On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
  (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS 
  HERE)
 
  I am noticing that my queries are spending a lot of time in nested loops.
  The table/index row estimates are not bad, but the nested loops can be 
  off
  by a factor of 50. In any case, they are always too high.
 
  Are the over-estimations below significant, and if so, is this an 
  indication
  of a general configuration problem?
  Sounds much like the issue I was seeing as well.
 
 
  Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
  time=8634.618..8637.918 rows=907 loops=1)
 
  You can to rewrite the queries to individual queries to see it if helps.
 
  In my case, I was doing
 
  select a.a,b.b,c.c from
  (select a from x where) a --- Put as a SRF
  left join (
  select b from y where ) b --- Put as a SRF
  on a.a = b.a
 
 
 
 
  ---(end of broadcast)---
  TIP 6: explain analyze is your friend
  
 
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] REPOST: Nested loops row estimates always too high

2007-09-24 Thread Carlo Stonebanks
My problem is that I think that SRF's are causing my problems. The SRF's
gets an automatic row estimate of 1000 rows. Add a condition to it, the
planner guesses 333 rows. Even at 333, this is an overestimate of the number
of rows returned.

I'm really disappointed - SRF's are a great way to place the enterprise's
db-centric business logic at the server.

Carlo

-Original Message-
From: Ow Mun Heng [mailto:[EMAIL PROTECTED] 
Sent: September 24, 2007 8:51 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high

On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
 Has anyone offered any answers to you? No one else has replied to this
post.

Overestimate of selectivity. I guess it's mainly due to my one to many
table relationships. I've tried everything from concatenated join
columns and indexing it to creating all sorts of indexes and splitting
the (1) tables into multiple tables and upping the indexes to 1000 and
turning of nestloops/enabling geqo/ tweaking the threshold/effort and
much much more (as much as I was asked to/suggested to) but still no
luck.

In my case, the individual queries were fast. So, In then end, I made a
SRF and used the SRFs to join each other. This worked better.


 
 
 Ow Mun Heng [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
  On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
  (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS 
  HERE)
 
  I am noticing that my queries are spending a lot of time in nested
loops.
  The table/index row estimates are not bad, but the nested loops can be 
  off
  by a factor of 50. In any case, they are always too high.
 
  Are the over-estimations below significant, and if so, is this an 
  indication
  of a general configuration problem?
  Sounds much like the issue I was seeing as well.
 
 
  Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
  time=8634.618..8637.918 rows=907 loops=1)
 
  You can to rewrite the queries to individual queries to see it if helps.
 
  In my case, I was doing
 
  select a.a,b.b,c.c from
  (select a from x where) a --- Put as a SRF
  left join (
  select b from y where ) b --- Put as a SRF
  on a.a = b.a
 
 
 
 
  ---(end of broadcast)---
  TIP 6: explain analyze is your friend
  
 
 
 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at
 
 http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] REPOST: Nested loops row estimates always too high

2007-09-24 Thread Ow Mun Heng
On Tue, 2007-09-25 at 00:53 -0400, Carlo Stonebanks wrote:
 My problem is that I think that SRF's are causing my problems. The SRF's
 gets an automatic row estimate of 1000 rows.

That's correct. That's what I see too though I may return 10K rows of
data. (min 10 columns)
But It's way faster than the normal joins I do.

 I'm really disappointed - SRF's are a great way to place the enterprise's
 db-centric business logic at the server.

Actually, I think in general, nested Loops, while evil, are just going
to be around. Even in MSSQL, when I'm pulling from, the nested loops are
many and I presume it's cos of the 8x SMP and the multiGB ram which is
making the query better.


 
 Carlo
 
 -Original Message-
 From: Ow Mun Heng [mailto:[EMAIL PROTECTED] 
 Sent: September 24, 2007 8:51 PM
 To: Carlo Stonebanks
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] REPOST: Nested loops row estimates always too high
 
 On Mon, 2007-09-24 at 14:12 -0400, Carlo Stonebanks wrote:
  Has anyone offered any answers to you? No one else has replied to this
 post.
 
 Overestimate of selectivity. I guess it's mainly due to my one to many
 table relationships. I've tried everything from concatenated join
 columns and indexing it to creating all sorts of indexes and splitting
 the (1) tables into multiple tables and upping the indexes to 1000 and
 turning of nestloops/enabling geqo/ tweaking the threshold/effort and
 much much more (as much as I was asked to/suggested to) but still no
 luck.
 
 In my case, the individual queries were fast. So, In then end, I made a
 SRF and used the SRFs to join each other. This worked better.
 
 
  
  
  Ow Mun Heng [EMAIL PROTECTED] wrote in message 
  news:[EMAIL PROTECTED]
   On Thu, 2007-09-20 at 11:02 -0400, Carlo Stonebanks wrote:
   (SORRY FOR THE REPOST, I DON'T SEE MY ORIGINAL QUESTION OR ANY ANSWERS 
   HERE)
  
   I am noticing that my queries are spending a lot of time in nested
 loops.
   The table/index row estimates are not bad, but the nested loops can be 
   off
   by a factor of 50. In any case, they are always too high.
  
   Are the over-estimations below significant, and if so, is this an 
   indication
   of a general configuration problem?
   Sounds much like the issue I was seeing as well.
  
  
   Unique  (cost=67605.91..67653.18 rows=4727 width=16) (actual
   time=8634.618..8637.918 rows=907 loops=1)
  
   You can to rewrite the queries to individual queries to see it if helps.
  
   In my case, I was doing
  
   select a.a,b.b,c.c from
   (select a from x where) a --- Put as a SRF
   left join (
   select b from y where ) b --- Put as a SRF
   on a.a = b.a
  
  
  
  
   ---(end of broadcast)---
   TIP 6: explain analyze is your friend
   
  
  
  ---(end of broadcast)---
  TIP 7: You can help support the PostgreSQL project by donating at
  
  http://www.postgresql.org/about/donate
 
 

---(end of broadcast)---
TIP 6: explain analyze is your friend