Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Jeff Amiel
Ok - I agree - 

Can somebody help me understand where the row estimates come from on a 
nested-loop operation in postgres then?



- Original Message -
From: hubert depesz lubaczewski dep...@depesz.com
To: Jeff Amiel becauseimj...@yahoo.com
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Sent: Saturday, May 18, 2013 3:39 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense
 to me

Not sure if it helps, but it's apparently not a very rare thing.
Quick analysis on data from explain.depesz.com showed that  12% of
plans with nested loop have such estimate.

Couple of examples:

http://explain.depesz.com/s/Qm4
http://explain.depesz.com/s/qmW
http://explain.depesz.com/s/qnG
http://explain.depesz.com/s/QO
http://explain.depesz.com/s/qov

...



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Amit Langote
On Mon, May 20, 2013 at 11:01 PM, Jeff Amiel becauseimj...@yahoo.com wrote:
 Ok - I agree -

 Can somebody help me understand where the row estimates come from on a 
 nested-loop operation in postgres then?


In case you haven't noticed already in the documentation, there are
following lines:

... It might appear from inspection of the EXPLAIN output that the
estimate of join rows comes from 50 * 1, that is, the number of outer
rows times the estimated number of rows obtained by each inner index
scan on tenk2. But this is not the case: *the join relation size is
estimated before any particular join plan has been considered*. If
everything is working well then the two ways of estimating the join
size will produce about the same answer, but due to roundoff error and
other factors they sometimes diverge significantly.

Read more at: 
http://www.postgresql.org/docs/9.2/static/row-estimation-examples.html

It also refers where in source code these table size estimations are done.

Hope this helps.


--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Amit Langote
I also found one other discussion which has similar issues addressed:

http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html

--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Jeff Amiel
Thanks much!
(sorry for top-posting, yahoo email sucks)




- Original Message -
From: Amit Langote amitlangot...@gmail.com
To: Jeff Amiel becauseimj...@yahoo.com
Cc: dep...@depesz.com dep...@depesz.com; pgsql-general@postgresql.org 
pgsql-general@postgresql.org
Sent: Monday, May 20, 2013 9:51 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

I also found one other discussion which has similar issues addressed:

http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html

--
Amit Langote



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-20 Thread Amit Langote
On Tue, May 21, 2013 at 12:43 AM, Jeff Amiel becauseimj...@yahoo.com wrote:
 Thanks much!
 (sorry for top-posting, yahoo email sucks)


I wonder if you could arrive at some conclusions with the statistics
(pg_stats) you have and the join selectivity formulas described in the
referred documentation link. I would like to know if you still get the
same row  estimates (after explain) and also if possible, the value
that is computed from that formula. Do they resemble each other?


--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-18 Thread hubert depesz lubaczewski
Not sure if it helps, but it's apparently not a very rare thing.
Quick analysis on data from explain.depesz.com showed that  12% of
plans with nested loop have such estimate.

Couple of examples:

http://explain.depesz.com/s/Qm4
http://explain.depesz.com/s/qmW
http://explain.depesz.com/s/qnG
http://explain.depesz.com/s/QO
http://explain.depesz.com/s/qov
http://explain.depesz.com/s/qqb
http://explain.depesz.com/s/QqF
http://explain.depesz.com/s/qQO
http://explain.depesz.com/s/qrI
http://explain.depesz.com/s/QRK
http://explain.depesz.com/s/QUX9
http://explain.depesz.com/s/QvN
http://explain.depesz.com/s/QWL
http://explain.depesz.com/s/r4F
http://explain.depesz.com/s/R7q
http://explain.depesz.com/s/r8
http://explain.depesz.com/s/R8
http://explain.depesz.com/s/RaB
http://explain.depesz.com/s/RbV
http://explain.depesz.com/s/Rc7

all these plans are public and not anonymized.

depesz



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel
On most nested loops that I do explain/explain analyze on, the row estimation 
for the nested-loop itself is a product of the inner nodes of the nested loop.
However in this case, I am stumped!

explain 
select  era.child_entity  from entity_rel era  join user_entity ue on 
ue.entity_id = era.parent_entity and ue.user_id=12345

Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
  -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4)
    Index Cond: (user_id = 10954)
  -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8)
    Index Cond: (parent_entity = ue.entity_id)


How can the estimated number of rows for the nested loop node EXCEED the 
product of the 2 row estimates of the tables being joined?
Not only does it exceed it - but it is orders of magnitude greater.  

Am I missing something obvious here?  I an see the nested loop row estimate 
being LESS but certainly not more.



PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath), 64-bit



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Tom Lane
Jeff Amiel becauseimj...@yahoo.com writes:
 How can the estimated number of rows for the nested loop node EXCEED the 
 product of the 2 row estimates of the tables being joined?
 Not only does it exceed it - but it is orders of magnitude greater.

Can you provide a self-contained test case that does this?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Amit Langote
On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote:
 On most nested loops that I do explain/explain analyze on, the row estimation 
 for the nested-loop itself is a product of the inner nodes of the nested loop.
 However in this case, I am stumped!

 explain
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
 rows=2 width=4)
 Index Cond: (user_id = 10954)
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
 rows=317 width=8)
 Index Cond: (parent_entity = ue.entity_id)


 How can the estimated number of rows for the nested loop node EXCEED the 
 product of the 2 row estimates of the tables being joined?
 Not only does it exceed it - but it is orders of magnitude greater.

 Am I missing something obvious here?  I an see the nested loop row estimate 
 being LESS but certainly not more.


Can you also post the output of explain analyze your-query?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel




 Can you provide a self-contained test case that does this?

That response scares me.
:)
I can try - Every other table set (small, easy to experiment with)  returns 
results as expected - 
Is the implication that this looks 'unusual'?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel




- Original Message -
From: Amit Langote amitlangot...@gmail.com
To: Jeff Amiel becauseimj...@yahoo.com
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Sent: Friday, May 17, 2013 11:37 AM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote:
 On most nested loops that I do explain/explain analyze on, the row estimation 
 for the nested-loop itself is a product of the inner nodes of the nested loop.
 However in this case, I am stumped!

 explain
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4)
         Index Cond: (user_id = 10954)
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8)
         Index Cond: (parent_entity = ue.entity_id)


 How can the estimated number of rows for the nested loop node EXCEED the 
 product of the 2 row estimates of the tables being joined?
 Not only does it exceed it - but it is orders of magnitude greater.

 Am I missing something obvious here?  I an see the nested loop row estimate 
 being LESS but certainly not more.


 Can you also post the output of explain analyze your-query?

I'm not worried about performance (per se) but the row estimation issue which 
propagates up as part of a bigger query.  But here ya go:

explain analyze
select  era.child_entity  from entity_rel era  join user_entity ue on 
ue.entity_id = era.parent_entity and ue.user_id=12345

Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 
rows=201 loops=1)
  -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
    Index Cond: (user_id = 12345)
    Heap Fetches: 1
  -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
    Index Cond: (parent_entity = ue.entity_id)
Total runtime: 0.361 ms


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Amit Langote
On Sat, May 18, 2013 at 1:47 AM, Jeff Amiel becauseimj...@yahoo.com wrote:




 - Original Message -
 From: Amit Langote amitlangot...@gmail.com
 To: Jeff Amiel becauseimj...@yahoo.com
 Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
 Sent: Friday, May 17, 2013 11:37 AM
 Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense 
 to me

 On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote:
 On most nested loops that I do explain/explain analyze on, the row 
 estimation for the nested-loop itself is a product of the inner nodes of the 
 nested loop.
 However in this case, I am stumped!

 explain
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
 rows=2 width=4)
 Index Cond: (user_id = 10954)
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
 rows=317 width=8)
 Index Cond: (parent_entity = ue.entity_id)


 How can the estimated number of rows for the nested loop node EXCEED the 
 product of the 2 row estimates of the tables being joined?
 Not only does it exceed it - but it is orders of magnitude greater.

 Am I missing something obvious here?  I an see the nested loop row estimate 
 being LESS but certainly not more.


 Can you also post the output of explain analyze your-query?

 I'm not worried about performance (per se) but the row estimation issue which 
 propagates up as part of a bigger query.  But here ya go:

 explain analyze
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
 time=0.028..0.274 rows=201 loops=1)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
 Index Cond: (user_id = 12345)
 Heap Fetches: 1
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
 Index Cond: (parent_entity = ue.entity_id)
 Total runtime: 0.361 ms

Have you tried analyze (it's probably a case of insufficient/outdated
statistics to planner's disposal) or probably consider changing
default_statistics_target?


--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel




 explain analyze
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
 time=0.028..0.274 rows=201 loops=1)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
         Index Cond: (user_id = 12345)
         Heap Fetches: 1
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
         Index Cond: (parent_entity = ue.entity_id)
 Total runtime: 0.361 ms

Have you tried analyze (it's probably a case of insufficient/outdated
statistics to planner's disposal) or probably consider changing
default_statistics_target?


Again - my question revolves not around the whether or not I am getting good or 
bad estimates - my question is related to the fact that the nested-loop row 
estimation does not appear to be derived from the nodes below it - it is off by 
orders of magnitude.  I've never seen this before.
That aside, yes - I did analyze and tweak stats target during experimentation - 
no change.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Amit Langote
 explain analyze
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
 time=0.028..0.274 rows=201 loops=1)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
 Index Cond: (user_id = 12345)
 Heap Fetches: 1
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
 Index Cond: (parent_entity = ue.entity_id)
 Total runtime: 0.361 ms


I noticed when the explain output in your first mail shows Index Cond:
(user_id = 10954) whereas your query says: ue.user_id=12345. Something
with that? Although, your explain analyze does show the same values at
both places with the row estimate being 29107 in both cases, which,
well, looks awful and quite unexpected though there seem to have been
similar observations before


Have you tried analyze (it's probably a case of insufficient/outdated
statistics to planner's disposal) or probably consider changing
default_statistics_target?


 Again - my question revolves not around the whether or not I am getting good 
 or bad estimates - my question is related to the fact that the nested-loop 
 row estimation does not appear to be derived from the nodes below it - it is 
 off by orders of magnitude.  I've never seen this before.
 That aside, yes - I did analyze and tweak stats target during experimentation 
 - no change.

Did you also check select count(*) on both the relations and found
related numbers?

--
Amit Langote


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does row estimation on nested loop make no sense to me

2013-05-17 Thread Jeff Amiel




- Original Message -
From: Amit Langote amitlangot...@gmail.com
To: Jeff Amiel becauseimj...@yahoo.com
Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org
Sent: Friday, May 17, 2013 2:21 PM
Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to 
me

 explain analyze
 select  era.child_entity  from entity_rel era  join user_entity ue on 
 ue.entity_id = era.parent_entity and ue.user_id=12345

 Nested Loop  (cost=0.00..2903.37 rows=29107 width=4) (actual 
 time=0.028..0.274 rows=201 loops=1)
   -  Index Only Scan using entity_pk on user_entity ue  (cost=0.00..62.68 
rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1)
         Index Cond: (user_id = 12345)
         Heap Fetches: 1
   -  Index Scan using rel_parent on entity_rel era  (cost=0.00..1261.85 
rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1)
         Index Cond: (parent_entity = ue.entity_id)
 Total runtime: 0.361 ms


I noticed when the explain output in your first mail shows Index Cond:
(user_id = 10954) whereas your query says: ue.user_id=12345. Something
with that? Although, your explain analyze does show the same values at
both places with the row estimate being 29107 in both cases, which,
well, looks awful and quite unexpected though there seem to have been
similar observations before

That was a weak attempt at hiding 'real' data - intended to change them all to 
12345.
:)

Did you also check select count(*) on both the relations and found
related numbers?

Nothing related (that I could find)  on the rowcounts - one table has 20 
million rows or so ad the other 65K.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general