Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-06 Thread Tom Lane
I wrote:
 I chewed on this for awhile and decided that there'd be no real harm in
 taking identification of the unique expressions out of 
 create_unique_path() and doing it earlier, in initsplan.c; we'd need a
 couple more fields in SpecialJoinInfo but that doesn't seem like a
 problem.  However, rel-rows is a *big* problem; we simply have not made
 any join size estimates yet, and can't, because these things are done
 bottom up.

 However ... estimate_num_groups's dependency on its rowcount input is not
 large (it's basically using it as a clamp).  So conceivably we could have
 get_loop_count just multiply together the sizes of the base relations
 included in the semijoin's RHS to get a preliminary estimate of that
 number.  This would be the right thing anyway for a single relation in the
 RHS, which is the most common case.  It would usually be an overestimate
 for join RHS, but we could hope that the output of estimate_num_groups
 wouldn't be affected too badly.

Attached is a draft patch that does those two things.  I like the first
part (replacing SpecialJoinInfo's rather ad-hoc join_quals field with
something more explicitly attuned to semijoin uniqueness processing).
The second part is still pretty much of a kluge, but then get_loop_count
was a kluge already.  This arguably makes it better.

Now, on the test case you presented, this has the unfortunate effect that
it now reliably chooses the wrong plan for both cases :-(.  But I think
that's a reflection of poor cost parameters (ie, test case fits handily in
RAM but we've not set the cost parameters to reflect that).  We do get the
same rowcount and roughly-same cost estimates for both the random_fk_dupl
and random_fk_uniq queries, so from that standpoint it's doing the right
thing.  If I reduce random_page_cost to 2 or so, it makes the choices you
wanted.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..9f65d66 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*** _copySpecialJoinInfo(const SpecialJoinIn
*** 1942,1948 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_NODE_FIELD(join_quals);
  
  	return newnode;
  }
--- 1942,1951 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_SCALAR_FIELD(semi_can_btree);
! 	COPY_SCALAR_FIELD(semi_can_hash);
! 	COPY_NODE_FIELD(semi_operators);
! 	COPY_NODE_FIELD(semi_rhs_exprs);
  
  	return newnode;
  }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fe509b0..fd876fb 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*** _equalSpecialJoinInfo(const SpecialJoinI
*** 798,804 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_NODE_FIELD(join_quals);
  
  	return true;
  }
--- 798,807 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_SCALAR_FIELD(semi_can_btree);
! 	COMPARE_SCALAR_FIELD(semi_can_hash);
! 	COMPARE_NODE_FIELD(semi_operators);
! 	COMPARE_NODE_FIELD(semi_rhs_exprs);
  
  	return true;
  }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..75c57a2 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outSpecialJoinInfo(StringInfo str, cons
*** 1945,1951 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_NODE_FIELD(join_quals);
  }
  
  static void
--- 1945,1954 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_BOOL_FIELD(semi_can_btree);
! 	WRITE_BOOL_FIELD(semi_can_hash);
! 	WRITE_NODE_FIELD(semi_operators);
! 	WRITE_NODE_FIELD(semi_rhs_exprs);
  }
  
  static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a9daf0..1a0d358 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** compute_semi_anti_join_factors(PlannerIn
*** 3294,3300 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.join_quals = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
--- 3294,3303 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.semi_can_btree = false;
! 	norm_sjinfo.semi_can_hash = false;
! 	norm_sjinfo.semi_operators = NIL;
! 	norm_sjinfo.semi_rhs_exprs = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
*** 

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby

On 2/28/15 12:01 PM, David Kubečka wrote:

With 'random_fk_dupl':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
With 'random_fk_uniq':
  -  Index Scan using facts_fk_idx on facts (cost=0.42..214.26
rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)

I have read the optimizer README file and also looked briefly at the
code, but this seems to be something not related to particular
implementation of algorithm (e.g. nested loop). Perhaps it's the way how
cost estimates are propagated down (or sideways? that would be weird...)
the query tree. But I am really not sure, since this is my first time
lookng at the optimizer code base. I should also add that I have
reproduced this behaviour for all versions of Pg from 9.2 up to current
devel.


This got answered on one of the other lists, right?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby

On 3/5/15 7:58 PM, Jim Nasby wrote:

This got answered on one of the other lists, right?


That was supposed to be off-list. I'll answer my own question: yes.

Sorry for the noise. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Tom Lane
=?UTF-8?Q?David_Kube=C4=8Dka?= kubecka@gmail.com writes:
 There is division by loop_count because of predicted effect of caching and
 it is exactly this division which makes the run_cost for single index
 lookup so low compared with the query version with random_fk_uniq.  So the
 main problem is that the planner calls cost_index with loop_count equal to
 number of rows of inner table, *but it ignores semi-join semantics*, i.e.
 it doesn't account for the number of *unique* rows in the inner table which
 will actually be the number of loops.

Good point.

 Since this is my first time looking into the optimizer (and in fact any
 postgres) code I am not yet able to locate the exact place where this
 should be repaired, but I hope that in few days I will have a patch :-)

Hm, this might not be the best problem to tackle for your first Postgres
patch :-(.  The information about the estimated number of unique rows
isn't readily available at the point where we're creating parameterized
paths.  When we do compute such an estimate, it's done like this:

pathnode-path.rows = estimate_num_groups(root, uniq_exprs, rel-rows);

where uniq_exprs is a list of right-hand-side expressions we've determined
belong to the semijoin, and rel-rows represents the raw size of the
semijoin's RHS relation (which might itself be a join).  Neither of those
things are available to create_index_path.

I chewed on this for awhile and decided that there'd be no real harm in
taking identification of the unique expressions out of 
create_unique_path() and doing it earlier, in initsplan.c; we'd need a
couple more fields in SpecialJoinInfo but that doesn't seem like a
problem.  However, rel-rows is a *big* problem; we simply have not made
any join size estimates yet, and can't, because these things are done
bottom up.

However ... estimate_num_groups's dependency on its rowcount input is not
large (it's basically using it as a clamp).  So conceivably we could have
get_loop_count just multiply together the sizes of the base relations
included in the semijoin's RHS to get a preliminary estimate of that
number.  This would be the right thing anyway for a single relation in the
RHS, which is the most common case.  It would usually be an overestimate
for join RHS, but we could hope that the output of estimate_num_groups
wouldn't be affected too badly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-04 Thread Robert Haas
On Mon, Mar 2, 2015 at 2:19 PM, David Kubečka kubecka@gmail.com wrote:
 The question is why optimizer, or rather the cost estimator, produced so
 much different estimates upon very small change in input. Moreover it seems
 that the main culprit of bad estimates isn't actually directly related to
 outer table, but rather to inner table. Just compare estimates for the two
 index scans:

 With 'random_fk_dupl':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
 rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
 With 'random_fk_uniq':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
 rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)

Whoa.  That's pretty dramatic.  Am I correctly understanding that the
two tables contain *exactly* the same data?  Could the issue be that
the optimizer thinks that one of the tables is thought to have a
higher logical-to-physical correlation than the other (see
pg_stats.correlation)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-04 Thread David Kubečka
Hi Tomas and others,

2015-03-02 21:29 GMT+01:00 Tomas Vondra tomas.von...@2ndquadrant.com:

 Hi David ;-)

 On 2.3.2015 20:19, David Kubečka wrote:
 
  The question is why optimizer, or rather the cost estimator,
  produced so much different estimates upon very small change in input.
  Moreover it seems that the main culprit of bad estimates isn't
  actually directly related to outer table, but rather to inner table.
  Just compare estimates for the two index scans:
 
  With 'random_fk_dupl':
   -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
  rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
  With 'random_fk_uniq':
   -  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
  rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
 
  I have read the optimizer README file and also looked briefly at the
   code, but this seems to be something not related to particular
  implementation of algorithm (e.g. nested loop). Perhaps it's the way
  how cost estimates are propagated down (or sideways? that would be
  weird...) the query tree. But I am really not sure, since this is my
  first time lookng at the optimizer code base. I should also add that
  I have reproduced this behaviour for all versions of Pg from 9.2 up
  to current devel.

 Interesting. I've only spent a few minutes looking at this, but it
 certainly is a bit strange that using smaller table with unique values
 results in a slower plan than using a table with duplicities (but the
 same number of unique values).


Yep. And I think that I have found the cause of the strangeness. I have
traced the planner code and this is how cost_index from costsize.c is
called when computing the cost of index scan on outer (facts) table, when
the inner table is with duplicate entries (random_fk_dupl):

cost_index (path=0x27f8fb8, root=0x27670e8, loop_count=9920)

The important thing here is the value of loop_count which is the (perfect
estimate of) number of rows of random_fk_dupl (I have recreated the table
so it differs slightly from previously posted version (9893)). Now the
predominant part of running cost is computed by this expression:

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

Since csquared (indexCorrelation * indexCorrelation) is very small in this
case, the biggest term here is max_IO_cost, which is computed as follows:

max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;

There is division by loop_count because of predicted effect of caching and
it is exactly this division which makes the run_cost for single index
lookup so low compared with the query version with random_fk_uniq.  So the
main problem is that the planner calls cost_index with loop_count equal to
number of rows of inner table, *but it ignores semi-join semantics*, i.e.
it doesn't account for the number of *unique* rows in the inner table which
will actually be the number of loops.

Since this is my first time looking into the optimizer (and in fact any
postgres) code I am not yet able to locate the exact place where this
should be repaired, but I hope that in few days I will have a patch :-)


 Another observation is that the planner does see other plans, because
 tweaking the cost variables like this:

 SET random_page_cost = 2;

 produces this plan (which on my machine runs just a tad slower than the
 first query in your post):

 QUERY PLAN
 -
  HashAggregate  (cost=10564.81..10688.47 rows=9893 width=15)
Group Key: facts.fk
-  Nested Loop  (cost=5.42..10515.34 rows=9893 width=15)
-  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
  Group Key: random_fk_uniq.fk
  -  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
-  Bitmap Heap Scan on facts  (cost=3.18..105.18 rows=100 ...)
Recheck Cond: (fk = random_fk_uniq.fk)
-  Bitmap Index Scan on facts_fk_idx  (cost=0.00..3.15 ...)
  Index Cond: (fk = random_fk_uniq.fk)


Yeah, this makes now perfect sense to me. From the run_cost and max_IO_cost
formulas you can see that (in this case!) random_page_cost is almost
exactly the linear factor here. So eventually the results are completely
opposite than I wished at first, i.e. instead of tweaking postgres to
produce lower costs for random_fk_uniq it will start producing higher costs
for random_fk_dupl. But I now believe that this is correct.

Regards,

-David


 I can get similar results by setting cpu_operator_cost=0.005. And
 further lowering to random_page_cost to 1.1 actually gets me this:

 QUERY PLAN
 -
  HashAggregate  (cost=6185.41..6309.08 rows=9893 width=15)
Group Key: facts.fk
-  Nested Loop  (cost=2.66..6135.95 rows=9893 width=15)
   -  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
  

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-03 Thread Kevin Grittner
David Kubečka kubecka@gmail.com wrote:

 I have read the optimizer README file and also looked briefly at
 the code, but this seems to be something not related to
 particular implementation of algorithm (e.g. nested loop).
 Perhaps it's the way how cost estimates are propagated down

It could be as simple as not having tuned your cost factors to
accurately reflect the relative costs of different actions in your
environment.  If you are using the default configuration, you might
want to try a few of the adjustments that are most often needed (at
least in my experience):

cpu_tuple_cost = 0.03
random_page_cost = 2
effective_cache_size = 50% to 75% of machine RAM
work_mem = machine RAM * 0.25 / max_connections

You can SET these on an individual connection, one at a time or in
combination, and EXPLAIN the query to see the effects on plan
choice.

Other advice, not all of which matches my personal experience, can
be found here:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

The best thing to do is experiment with different values with your
own queries and workloads to see what most accurately models your
costs (and thus produces the fastest plans).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-02 Thread Tomas Vondra
Hi David ;-)

On 2.3.2015 20:19, David Kubečka wrote:

 The question is why optimizer, or rather the cost estimator,
 produced so much different estimates upon very small change in input.
 Moreover it seems that the main culprit of bad estimates isn't
 actually directly related to outer table, but rather to inner table.
 Just compare estimates for the two index scans:
 
 With 'random_fk_dupl':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
 rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
 With 'random_fk_uniq':
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
 rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
 
 I have read the optimizer README file and also looked briefly at the
  code, but this seems to be something not related to particular 
 implementation of algorithm (e.g. nested loop). Perhaps it's the way 
 how cost estimates are propagated down (or sideways? that would be 
 weird...) the query tree. But I am really not sure, since this is my 
 first time lookng at the optimizer code base. I should also add that 
 I have reproduced this behaviour for all versions of Pg from 9.2 up 
 to current devel.

Interesting. I've only spent a few minutes looking at this, but it
certainly is a bit strange that using smaller table with unique values
results in a slower plan than using a table with duplicities (but the
same number of unique values).

Another observation is that the planner does see other plans, because
tweaking the cost variables like this:

SET random_page_cost = 2;

produces this plan (which on my machine runs just a tad slower than the
first query in your post):

QUERY PLAN
-
 HashAggregate  (cost=10564.81..10688.47 rows=9893 width=15)
   Group Key: facts.fk
   -  Nested Loop  (cost=5.42..10515.34 rows=9893 width=15)
   -  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
 Group Key: random_fk_uniq.fk
 -  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
   -  Bitmap Heap Scan on facts  (cost=3.18..105.18 rows=100 ...)
   Recheck Cond: (fk = random_fk_uniq.fk)
   -  Bitmap Index Scan on facts_fk_idx  (cost=0.00..3.15 ...)
 Index Cond: (fk = random_fk_uniq.fk)

I can get similar results by setting cpu_operator_cost=0.005. And
further lowering to random_page_cost to 1.1 actually gets me this:

QUERY PLAN
-
 HashAggregate  (cost=6185.41..6309.08 rows=9893 width=15)
   Group Key: facts.fk
   -  Nested Loop  (cost=2.66..6135.95 rows=9893 width=15)
  -  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
Group Key: random_fk_uniq.fk
-  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
  -  Index Scan using facts_fk_idx on facts  (cost=0.42..60.95 ...)
   Index Cond: (fk = random_fk_uniq.fk)

which is exactly the first plan from your post.

I still don't understand why using the smaller table produces less
efficient plan, but the estimated costs are very clost (20013 vs. 18147
is very small difference in this context), and the estimator shoots for
minimal average error. So it may seem 'weirdly pessimistic' but it might
be equally optimistic for other queries.

Also, keep in mind that the cost model is just a simplification of
reality, so it can't be correct 100% of the time.

The fact that this small difference in costs results in significant
duration difference suggests that the default optimizer cost values do
not reflect your environment. For example, you assume that this is very
fast:

NestLoop
- Seq Scan on A
- Index Scan using B_Y_IDX on B
Index Condition: B.Y = A.X

but index scans often produce a lot of random I/O, and that may be quite
expensive if it really hits the storage. And that's what the default
values (random_page_cost=4) is set for. But if you're on a system with
lots of RAM, or SSD, ... that simply is not the case, and may penalize
index scans (and prefer more sequential workloads).

The other thing you might do is creating index on (fk,f), which on the
new releases produces this:

   QUERY PLAN
-
 HashAggregate  (cost=783.77..907.43 rows=9893 width=15)
   Group Key: facts.fk
   -  Nested Loop  (cost=2.66..734.30 rows=9893 width=15)
   -  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
   Group Key: random_fk_uniq.fk
   -  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
   -  Index Only Scan using facts_2_idx on facts  (cost=...)
   Index Cond: (fk = random_fk_uniq.fk)

Which limits the amount of random I/O by only scanning the index, and is
even faster than the first query.

regard

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/