subject:"\[HACKERS\] row estimation off the mark when generate

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

2010-04-20 Thread Nikhil Sontakke

Hi,

 I'm not very impressed with that patch, even discounting the
 sum-vs-product thinko.  Trawling the tlist for SRFs will add a significant
 number of cycles, to modify the rowcount in a way that is practically
 always wrong (since the estimates for SRF output rowcounts are so bad).

It's trawling *just* the tlist and not the entire query. Given that
rowcount estimates are way off the mark for SRF targets, it seems
worth the cycles. It's not like sane queries will have very many tlist
entries to make this a costly affair IMO.

 What's more, most of the time we don't really care, because the
 top-level rowcount estimate is of no interest for future planning
 purposes.  It might be worth doing something about this inside
 sub-selects, but not till we have less-bogus SRF rowcount estimates.


But SRF rowcount estimates will generally be bogus, no? Unless we can
come up with a mechanism to gather plan-time arguments based
statistics mechanism, that will still be the case.

To mention Itagaki san's example again:

INSERT INTO public.x SELECT generate_series(1,1000);

If we have valid row estimates we might use a proper plan to maybe
materialize the SELECT portion into a temp table for example and
insert into the target. The performance might be much better in that
case..

Regards,
Nikhils
-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] row estimation off the mark when generate_series calls are involved

2010-04-19 Thread Nikhil Sontakke

Another email which went into the wilderness when I sent it to pgsql-patches.

Regards,
Nikhils


-- Forwarded message --
From: Nikhil Sontakke nikhil.sonta...@enterprisedb.com
Date: Fri, Apr 16, 2010 at 6:50 PM
Subject: row estimation off the mark when generate_series calls are involved
To: pgsql-patc...@postgresql.org


Hi,

I observed the following behavior on PG head:

postgres=# create table x(x int);
CREATE TABLE
postgres=# explain verbose insert into public.x values (generate_series(1,10));

 Insert  (cost=0.00..0.01 rows=1 width=0)

postgres=# explain verbose insert into public.x values
(generate_series(1,1000));

 Insert  (cost=0.00..0.01 rows=1 width=0)

So even though generate_series has a prorows value of 1000 (why did we
pick this value, just a guesstimate I guess?), its effects are not
shown in the plan at all. I think the place where we set the
targetlist of the result_plan to sub_tlist, immediately after that we
should update the plan_rows estimate by walking this latest
targetlist. I did that and now we seem to get proper row estimates.

Comments?

Regards,
Nikhils
--
http://www.enterprisedb.com



-- 
http://www.enterprisedb.com
Index: src/backend/optimizer/plan/planner.c
===
RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/optimizer/plan/planner.c,v
retrieving revision 1.267
diff -c -r1.267 planner.c
*** src/backend/optimizer/plan/planner.c	30 Mar 2010 21:58:10 -	1.267
--- src/backend/optimizer/plan/planner.c	16 Apr 2010 13:46:35 -
***
*** 1241,1246 
--- 1241,1253 
  	 * the desired tlist.
  	 */
  	result_plan-targetlist = sub_tlist;
+ 
+ 	/*
+ 	 * Account for changes in plan row estimates because of
+ 	 * this tlist addition
+ 	 */
+ 	result_plan-plan_rows += clamp_row_est(
+ expression_returns_set_rows((Node *)result_plan-targetlist));
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

2010-04-19 Thread Takahiro Itagaki


Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote:

 I observed the following behavior on PG head:
 postgres=# explain verbose insert into public.x values 
 (generate_series(1,10));
 
 Insert (cost=0.00..0.01 rows=1 width=0)

Hmmm, there are differences between SELECT SRF() and SELECT * FROM SRF():

postgres=# EXPLAIN INSERT INTO public.x SELECT generate_series(1,10);
   QUERY PLAN

 Insert  (cost=0.00..0.02 rows=1 width=4)
   -  Result  (cost=0.00..0.01 rows=1 width=0)

postgres=# EXPLAIN INSERT INTO public.x SELECT * FROM generate_series(1,10);
  QUERY PLAN
--
 Insert  (cost=0.00..10.00 rows=1000 width=4)
   -  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4)


 I think the place where we set the
 targetlist of the result_plan to sub_tlist, immediately after that we
 should update the plan_rows estimate by walking this latest
 targetlist. I did that and now we seem to get proper row estimates.

I agree the estimation should be improved, but your patch *adds*
the estimated number of rows to the result:

postgres=# EXPLAIN INSERT INTO public.x SELECT generate_series(1,10);
QUERY PLAN
---
 Insert  (cost=0.00..12.52 rows=1002 width=4)
   -  Result  (cost=0.00..2.51 rows=1001 width=0)

Should it be 1000 rather than 1001?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

2010-04-19 Thread Tom Lane

Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes:
 Nikhil Sontakke nikhil.sonta...@enterprisedb.com wrote:
 I think the place where we set the
 targetlist of the result_plan to sub_tlist, immediately after that we
 should update the plan_rows estimate by walking this latest
 targetlist. I did that and now we seem to get proper row estimates.

 I agree the estimation should be improved, but your patch *adds*
 the estimated number of rows to the result:

I'm not very impressed with that patch, even discounting the
sum-vs-product thinko.  Trawling the tlist for SRFs will add a significant
number of cycles, to modify the rowcount in a way that is practically
always wrong (since the estimates for SRF output rowcounts are so bad).
What's more, most of the time we don't really care, because the
top-level rowcount estimate is of no interest for future planning
purposes.  It might be worth doing something about this inside
sub-selects, but not till we have less-bogus SRF rowcount estimates.

BTW, another reason for not being excited about this is that someday we
ought to disallow SRFs in the tlist altogether --- once we have LATERAL,
which might happen in 9.1, that will be a much more semantically
consistent way of getting the desired behavior.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

[HACKERS] row estimation off the mark when generate_series calls are involved

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

Re: [HACKERS] row estimation off the mark when generate_series calls are involved

4 matches

Site Navigation

Mail list logo

Footer information