Re: [HACKERS] Very poor estimates from planner

2003-11-09 Thread Rod Taylor
On Thu, 2003-11-06 at 10:35, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  -  Hash Join  (cost=3D1230.79..60581.82 rows=3D158 width=3D54)=
   (actual time=3D1262.35..151200.29 rows=3D1121988 loops=3D1)
  Hash Cond: (outer.account_id =3D inner.account_id)
  -  Hash Join  (cost=3D1226.78..52863.43 rows=3D1542558 w=
  idth=3D50) (actual time=3D1261.63..100418.30 rows=3D1573190 loops=3D1)
  (join of bsod, tsb, tss)
 
 (btw, would you mind turning off MIME encoding in your mails to the PG
 lists?  It's a real PITA to quote.)

I can, though I would ask which email client you use that doesn't pull
content out of mime encoded emails.

  So yes, since this is a full table scan all values will be joined since
  the foreign key enforces them all to exist.
 
 Well, no, because only 1121988 rows come out of the join when 1573190
 went in.  So the actual selectivity of the join is about 70%.  The
 question is why the planner is estimating the selectivity at 0.01%
 (158/1542558).
 
 Could we see the pg_stats rows for service.account_id and
 account.account_id?

 relname |  attname   | stanullfrac | stawidth | stadistinct | stakind1
| stakind2 | stakind3 | stakind4 | staop1 | staop2 | staop3 | staop4
|  
stanumbers1| stanumbers2 |
stanumbers3 | stanumbers4 |  
stavalues1| stavalues2 | stavalues3 |
stavalues4
-++-+--+-+--+--+--+--+++++--+-+-+-+-+++
 service | account_id |   0 |4 |  10 |1
|3 |0 |0 | 96 | 97 |  0 |  0 |
{0.388393,0.0825893,0.078125,0.0758929,0.0703125,0.0647321,0.0647321,0.0636161,0.0625,0.0491071}
 | {0.591672}  | | | 
{1,8221,8223,8226,8222,8218,8220,8219,8224,8225}||   |
 account | account_id |   0 |4 |  -1 |2
|3 |0 |0 | 97 | 97 |  0 |  0
|  
| {0.97034}   | | | 
{1,10178,12136,14099,16054,18011,19966,21924,23881,26018,27995} ||   |
(2 rows)




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Very poor estimates from planner

2003-11-06 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 -  Hash Join  (cost=3D1230.79..60581.82 rows=3D158 width=3D54)=
  (actual time=3D1262.35..151200.29 rows=3D1121988 loops=3D1)
 Hash Cond: (outer.account_id =3D inner.account_id)
 -  Hash Join  (cost=3D1226.78..52863.43 rows=3D1542558 w=
 idth=3D50) (actual time=3D1261.63..100418.30 rows=3D1573190 loops=3D1)
 (join of bsod, tsb, tss)

(btw, would you mind turning off MIME encoding in your mails to the PG
lists?  It's a real PITA to quote.)

 So yes, since this is a full table scan all values will be joined since
 the foreign key enforces them all to exist.

Well, no, because only 1121988 rows come out of the join when 1573190
went in.  So the actual selectivity of the join is about 70%.  The
question is why the planner is estimating the selectivity at 0.01%
(158/1542558).

Could we see the pg_stats rows for service.account_id and
account.account_id?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[Fwd: Re: [HACKERS] Very poor estimates from planner]

2003-11-06 Thread Rod Taylor
On Thu, 2003-11-06 at 10:35, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  -  Hash Join  (cost=3D1230.79..60581.82 rows=3D158 width=3D54)=
   (actual time=3D1262.35..151200.29 rows=3D1121988 loops=3D1)
  Hash Cond: (outer.account_id =3D inner.account_id)
  -  Hash Join  (cost=3D1226.78..52863.43 rows=3D1542558 w=
  idth=3D50) (actual time=3D1261.63..100418.30 rows=3D1573190 loops=3D1)
  (join of bsod, tsb, tss)
 
 (btw, would you mind turning off MIME encoding in your mails to the PG
 lists?  It's a real PITA to quote.)

I can, though I would ask which email client you use that doesn't pull
content out of mime encoded emails.

  So yes, since this is a full table scan all values will be joined since
  the foreign key enforces them all to exist.
 
 Well, no, because only 1121988 rows come out of the join when 1573190
 went in.  So the actual selectivity of the join is about 70%.  The
 question is why the planner is estimating the selectivity at 0.01%
 (158/1542558).
 
 Could we see the pg_stats rows for service.account_id and
 account.account_id?

 relname |  attname   | stanullfrac | stawidth | stadistinct | stakind1
| stakind2 | stakind3 | stakind4 | staop1 | staop2 | staop3 | staop4
|  
stanumbers1| stanumbers2 |
stanumbers3 | stanumbers4 |  
stavalues1| stavalues2 | stavalues3 |
stavalues4
-++-+--+-+--+--+--+--+++++--+-+-+-+-+++
 service | account_id |   0 |4 |  10 |1
|3 |0 |0 | 96 | 97 |  0 |  0 |
{0.388393,0.0825893,0.078125,0.0758929,0.0703125,0.0647321,0.0647321,0.0636161,0.0625,0.0491071}
 | {0.591672}  | | | 
{1,8221,8223,8226,8222,8218,8220,8219,8224,8225}||   |
 account | account_id |   0 |4 |  -1 |2
|3 |0 |0 | 97 | 97 |  0 |  0
|  
| {0.97034}   | | | 
{1,10178,12136,14099,16054,18011,19966,21924,23881,26018,27995} ||   |
(2 rows)

-- 
Rod Taylor rbt [at] rbt [dot] ca

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/rbtpub.asc


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread scott.marlowe
On Wed, 5 Nov 2003, Rod Taylor wrote:

 Since this is a large query, attachments for the explains / query.
 
 Configuration:
 dev_iqdb=# select version();
 version
 
  PostgreSQL 7.4beta1 on i386-portbld-freebsd4.7, compiled by GCC
 2.95.4
 (1 row)
 
 SET default_statistics_target = 1000;
 ANALYZE;
 set from_collapse_limit = 100;
 set join_collapse_limit = 20;

I'm not sure if that will actually change the default_statistics_target of 
the tables you're analyzing, I think it will only apply to newly created 
tables.  

I believe you have to alter table alter column set statistics 1000 for 
each column you want a statistic of 1000.  You might wanna try starting 
with 50 or 100 and see if that works first.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread Rod Taylor
 I'm not sure if that will actually change the default_statistics_target of 
 the tables you're analyzing, I think it will only apply to newly created 
 tables.  
 
 I believe you have to alter table alter column set statistics 1000 for 
 each column you want a statistic of 1000.  You might wanna try starting 
 with 50 or 100 and see if that works first.

Hmm.. I was under the impression that it would work for any tables that
haven't otherwise been overridden.

Sets the default statistics target for table columns that have
not had a column-specific target set via ALTER TABLE SET
STATISTICS. Larger values increase the time needed to do
ANALYZE, but may improve the quality of the planner's estimates.
The default is 10.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 I'm not sure if that will actually change the default_statistics_target

 Hmm.. I was under the impression that it would work for any tables that
 haven't otherwise been overridden.

It will.  I think Scott is recalling the original circa-7.2
implementation, where it wouldn't.  If you're unsure that you
affected it, check out the actual sizes of the array values in pg_stats.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread Rod Taylor
On Wed, 2003-11-05 at 18:57, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  I'm not sure if that will actually change the default_statistics_target
 
  Hmm.. I was under the impression that it would work for any tables that
  haven't otherwise been overridden.
 
 It will.  I think Scott is recalling the original circa-7.2
 implementation, where it wouldn't.  If you're unsure that you
 affected it, check out the actual sizes of the array values in pg_stats.

The plan does change when I remove it and re-analyze it.

Either way, it's off by a factor of a 10^5 or so.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 Effectively, the planner has amazingly inaccurate row estimates.

It seems the key estimation failure is in this join step:

  -  Hash Join  (cost=1230.79..60581.82 rows=158 width=54) (actual 
time=1262.35..151200.29 rows=1121988 loops=1)
Hash Cond: (outer.account_id = inner.account_id)
-  Hash Join  (cost=1226.78..52863.43 rows=1542558 width=50) (actual 
time=1261.63..100418.30 rows=1573190 loops=1)
  (join of bsod, tsb, tss)
-  Hash  (cost=4.01..4.01 rows=1 width=4) (actual time=0.33..0.33 
rows=0 loops=1)
  -  Index Scan using single_null_parent_account_hack on account 
ap  (cost=0.00..4.01 rows=1 width=4) (actual time=0.26..0.28 rows=1 loops=1)
Filter: (parent_account_id IS NULL)

The estimated number of rows out of the join of bsod, tsb, tss isn't far
off, but the estimate for the result of joining that to ap is WAY off.
Apparently the planner thinks that only a few rows in the join will have
matches in ap, but really they almost all do.  Any idea why?  The
account_id stats for each seem to be the thing to look at.

 Any hints? I'm basically stuck. Oh, and I would like to ask for a
 pgadmin feature -- visual explain :)

You do know that Red Hat has been offering a Visual Explain tool for
some time?
http://sources.redhat.com/rhdb/
I've not had much occasion to use it myself, but it works ...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread scott.marlowe
On Wed, 5 Nov 2003, Tom Lane wrote:

 Rod Taylor [EMAIL PROTECTED] writes:
  I'm not sure if that will actually change the default_statistics_target
 
  Hmm.. I was under the impression that it would work for any tables that
  haven't otherwise been overridden.
 
 It will.  I think Scott is recalling the original circa-7.2
 implementation, where it wouldn't.  If you're unsure that you
 affected it, check out the actual sizes of the array values in pg_stats.

Hey, can anyone guess what version I'm running in production :-)  Thanks 
for the catch.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Very poor estimates from planner

2003-11-05 Thread Rod Taylor
On Wed, 2003-11-05 at 19:18, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  Effectively, the planner has amazingly inaccurate row estimates.
 
 It seems the key estimation failure is in this join step:
 
   -  Hash Join  (cost=1230.79..60581.82 rows=158 width=54) (actual 
 time=1262.35..151200.29 rows=1121988 loops=1)
 Hash Cond: (outer.account_id = inner.account_id)
 -  Hash Join  (cost=1226.78..52863.43 rows=1542558 width=50) 
 (actual time=1261.63..100418.30 rows=1573190 loops=1)
   (join of bsod, tsb, tss)
 -  Hash  (cost=4.01..4.01 rows=1 width=4) (actual time=0.33..0.33 
 rows=0 loops=1)
   -  Index Scan using single_null_parent_account_hack on 
 account ap  (cost=0.00..4.01 rows=1 width=4) (actual time=0.26..0.28 rows=1 loops=1)
 Filter: (parent_account_id IS NULL)
 
 The estimated number of rows out of the join of bsod, tsb, tss isn't far
 off, but the estimate for the result of joining that to ap is WAY off.
 Apparently the planner thinks that only a few rows in the join will have
 matches in ap, but really they almost all do.  Any idea why?  The
 account_id stats for each seem to be the thing to look at.

The account structure is tree, as is the product catalogue. Essentially
what the query does is convert high level recorded invoices into the
lower (not quite base) items for billing the different parties involved
in the transaction. bsod and tsb are both foreign keys to tss on the
columns being joined. Since these are full table scans, all values will
join.

This join in particular is  bsod (lineitems) and tsb (cached graph of
the product catalog tree) which are both foreign key'd off of tss
(product catalog tree).

So yes, since this is a full table scan all values will be joined since
the foreign key enforces them all to exist.

  Any hints? I'm basically stuck. Oh, and I would like to ask for a
  pgadmin feature -- visual explain :)
 
 You do know that Red Hat has been offering a Visual Explain tool for
 some time?
 http://sources.redhat.com/rhdb/
 I've not had much occasion to use it myself, but it works ...

Yeah.. but pgadmin is in the ports tree.

I'll take a peak at it.


signature.asc
Description: This is a digitally signed message part