Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-06 Thread Tom Lane
I wrote:

 Attached is a draft patch against HEAD for this.

I've finished back-porting this.  I'm not going to commit it until 9.2.0
is definitely gold, but attached is the 9.1 version of the patch, if
you'd like to try it and verify that it fixes your original problem.

regards, tom lane



binwCG5SIiSLv.bin
Description: planner-param-handling-9.1.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-05 Thread Vik Reykja
On Wed, Sep 5, 2012 at 6:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  I think probably the best fix is to rejigger things so that Params
  assigned by different executions of SS_replace_correlation_vars and
  createplan.c can't share PARAM_EXEC numbers.  This will result in
  rather larger ecxt_param_exec_vals arrays at runtime, but the array
  entries aren't very large, so I don't think it'll matter.

 Attached is a draft patch against HEAD for this.  I think it makes the
 planner's handling of outer-level Params far less squishy than it's ever
 been, but it is rather a large change.  Not sure whether to risk pushing
 it into 9.2 right now, or wait till after we cut 9.2.0 ... thoughts?


I am not in a position to know what's best for the project but my company
can't upgrade (from 9.0) until this is fixed.  We'll wait for 9.2.1 if we
have to.  After all, we skipped 9.1.


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-05 Thread Thom Brown
On 5 September 2012 05:09, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 I think probably the best fix is to rejigger things so that Params
 assigned by different executions of SS_replace_correlation_vars and
 createplan.c can't share PARAM_EXEC numbers.  This will result in
 rather larger ecxt_param_exec_vals arrays at runtime, but the array
 entries aren't very large, so I don't think it'll matter.

 Attached is a draft patch against HEAD for this.  I think it makes the
 planner's handling of outer-level Params far less squishy than it's ever
 been, but it is rather a large change.  Not sure whether to risk pushing
 it into 9.2 right now, or wait till after we cut 9.2.0 ... thoughts?

Just so someone else has tested the case in question, here's the
result this end:

 id | array
+---
  1 | {1}
  1 | {1}
(2 rows)


QUERY PLAN
---
 Result  (cost=131.45..133.07 rows=8 width=36)
   CTE a
 -  Nested Loop  (cost=87.18..131.09 rows=7 width=4)
   -  Merge Right Join  (cost=87.18..123.33 rows=7 width=4)
 Merge Cond: (((pg_c.relname)::text) = ((t2.id)::text))
 Filter: (pg_c.oid IS NULL)
 -  Sort  (cost=22.82..23.55 rows=291 width=68)
   Sort Key: ((pg_c.relname)::text)
   -  Seq Scan on pg_class pg_c
(cost=0.00..10.91 rows=291 width=68)
 -  Sort  (cost=64.36..66.84 rows=993 width=4)
   Sort Key: ((t2.id)::text)
   -  Seq Scan on t2  (cost=0.00..14.93 rows=993 width=4)
   -  Index Only Scan using t1_pkey on t1  (cost=0.00..1.10
rows=1 width=4)
 Index Cond: (id = t2.id)
   CTE b
 -  WindowAgg  (cost=0.24..0.36 rows=7 width=4)
   -  Sort  (cost=0.24..0.26 rows=7 width=4)
 Sort Key: a.id
 -  CTE Scan on a  (cost=0.00..0.14 rows=7 width=4)
   -  Append  (cost=0.00..1.62 rows=8 width=36)
 -  CTE Scan on a  (cost=0.00..0.77 rows=4 width=4)
   Filter: is_something
   SubPlan 3
 -  CTE Scan on b  (cost=0.00..0.16 rows=1 width=4)
   Filter: (id = a.id)
 -  CTE Scan on a  (cost=0.00..0.77 rows=4 width=4)
   Filter: is_something
   SubPlan 4
 -  CTE Scan on b  (cost=0.00..0.16 rows=1 width=4)
   Filter: (id = a.id)
(30 rows)

As for shipping without the fix, I'm torn on whether to do so or not.
I imagine most productions will wait for a .1 or .2 release, and use
.0 for migration testing.  Plus this bug hasn't been hit (or at least
not noticed) during 5 releases of 9.1, and there isn't enough time
left before shipping to expose the changes to enough testing in the
areas affected, so I'd be slightly inclined to push this into 9.1.6
and 9.2.1.

Regards

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-05 Thread Tom Lane
Thom Brown t...@linux.com writes:
 On 5 September 2012 05:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Attached is a draft patch against HEAD for this.  I think it makes the
 planner's handling of outer-level Params far less squishy than it's ever
 been, but it is rather a large change.  Not sure whether to risk pushing
 it into 9.2 right now, or wait till after we cut 9.2.0 ... thoughts?

 As for shipping without the fix, I'm torn on whether to do so or not.
 I imagine most productions will wait for a .1 or .2 release, and use
 .0 for migration testing.  Plus this bug hasn't been hit (or at least
 not noticed) during 5 releases of 9.1, and there isn't enough time
 left before shipping to expose the changes to enough testing in the
 areas affected, so I'd be slightly inclined to push this into 9.1.6
 and 9.2.1.

Yeah, after sleeping on it that's my feeling as well.  The patch needs
some rework for back branches anyway (since a nontrivial part of it
is touching LATERAL support that doesn't exist before HEAD).  I'll
push the fix to HEAD but wait till after 9.2.0 wrap for the back
branches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-05 Thread Tom Lane
BTW, after considerable fooling around with Vik's example, I've been
able to produce a regression test case that fails in all PG versions
with WITH:

with
A as ( select q2 as id, (select q1) as x from int8_tbl ),
B as ( select id, row_number() over (partition by id) as r from A ),
C as ( select A.id, array(select B.id from B where B.id = A.id) from A )
select * from C;

The correct answer to this is

id |array
---+-
   456 | {456}
  4567890123456789 | {4567890123456789,4567890123456789}
   123 | {123}
  4567890123456789 | {4567890123456789,4567890123456789}
 -4567890123456789 | {-4567890123456789}
(5 rows)

as you can soon convince yourself by inspecting the contents of
int8_tbl:

q1|q2 
--+---
  123 |   456
  123 |  4567890123456789
 4567890123456789 |   123
 4567890123456789 |  4567890123456789
 4567890123456789 | -4567890123456789
(5 rows)

I got that answer with patched HEAD, but all the back branches
give me

id |array
---+-
   456 | {4567890123456789,4567890123456789}
  4567890123456789 | {4567890123456789,4567890123456789}
   123 | {123}
  4567890123456789 | {4567890123456789,4567890123456789}
 -4567890123456789 | {-4567890123456789}
(5 rows)

So this does indeed need to be back-patched as far as 8.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-04 Thread Tom Lane
Vik Reykja vikrey...@gmail.com writes:
 Hello.  It took me a while to get a version of this that was independent of
 my data, but here it is.  I don't understand what's going wrong but if you
 change any part of this query (or at least any part I tried), the correct
 result is returned.

Huh.  9.1 gets the wrong answer too, so this isn't a (very) new bug;
but curiously, 8.4 and 9.0 seem to get it right.  I think this is
probably related somehow to Adam Mackler's recent report --- multiple
scans of the same CTE seems to be a bit of a soft spot :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-04 Thread Tom Lane
I wrote:
 Vik Reykja vikrey...@gmail.com writes:
 Hello.  It took me a while to get a version of this that was independent of
 my data, but here it is.  I don't understand what's going wrong but if you
 change any part of this query (or at least any part I tried), the correct
 result is returned.

 Huh.  9.1 gets the wrong answer too, so this isn't a (very) new bug;
 but curiously, 8.4 and 9.0 seem to get it right.  I think this is
 probably related somehow to Adam Mackler's recent report --- multiple
 scans of the same CTE seems to be a bit of a soft spot :-(

No, I'm mistaken: it's a planner bug.  The plan looks like this:

QUERY PLAN  
   
---
 Result  (cost=281.29..290.80 rows=20 width=36)
   CTE a
 -  Nested Loop  (cost=126.96..280.17 rows=19 width=4)
   -  Merge Right Join  (cost=126.96..220.17 rows=19 width=4)
 Merge Cond: (((pg_c.relname)::text) = ((t2.id)::text))
 Filter: (pg_c.oid IS NULL)
 -  Sort  (cost=61.86..63.72 rows=743 width=68)
   Sort Key: ((pg_c.relname)::text)
   -  Seq Scan on pg_class pg_c  (cost=0.00..26.43 
rows=743 width=68)
 -  Sort  (cost=65.10..67.61 rows=1004 width=4)
   Sort Key: ((t2.id)::text)
   -  Seq Scan on t2  (cost=0.00..15.04 rows=1004 width=4)
   -  Index Scan using t1_pkey on t1  (cost=0.00..3.14 rows=1 width=4)
 Index Cond: (id = t2.id***)
   CTE b
 -  WindowAgg  (cost=0.78..1.12 rows=19 width=4)
   -  Sort  (cost=0.78..0.83 rows=19 width=4)
 Sort Key: a.id
 -  CTE Scan on a  (cost=0.00..0.38 rows=19 width=4)
   -  Append  (cost=0.00..9.51 rows=20 width=36)
 -  CTE Scan on a  (cost=0.00..4.66 rows=10 width=4)
   Filter: is_something
   SubPlan 3
 -  CTE Scan on b  (cost=0.00..0.43 rows=1 width=4)
   Filter: (id = a.id***)
 -  CTE Scan on a  (cost=0.00..4.66 rows=10 width=4)
   Filter: is_something
   SubPlan 4
 -  CTE Scan on b  (cost=0.00..0.43 rows=1 width=4)
   Filter: (id = a.id***)
(30 rows)

The planner is assigning a single PARAM_EXEC slot for the parameter
passed into the inner indexscan in CTE a and for the parameters passed
into the two SubPlans that scan CTE b (the items I marked with ***
above).  This is safe enough so far as the two SubPlans are concerned,
because they don't execute concurrently --- but when SubPlan 3 is first
fired, it causes the remainder of CTE a to be computed, so that the
nestloop gets iterated some more times, and that overwrites the value of
a.id that already got passed down into SubPlan 3.

The reason this is so hard to replicate is that the PARAM_EXEC slot can
only get re-used for identical-looking Vars (same varno, varlevelsup,
vartype, etc) --- so even granted that you've got the right shape of
plan, minor unrelated changes in the query can stop the aliasing
from occurring.  Also, inner indexscans weren't using the PARAM_EXEC
mechanism until 9.1, so that's why the example doesn't fail before 9.1.

I've always been a bit suspicious of the theory espoused in
replace_outer_var that aliasing different Params is OK:

 * NOTE: in sufficiently complex querytrees, it is possible for the same
 * varno/abslevel to refer to different RTEs in different parts of the
 * parsetree, so that different fields might end up sharing the same Param
 * number.  As long as we check the vartype/typmod as well, I believe that
 * this sort of aliasing will cause no trouble.  The correct field should
 * get stored into the Param slot at execution in each part of the tree.

but I've never seen a provably wrong case before.  Most likely, this has
been broken since we introduced CTEs in 8.4: it's the decoupled timing
of execution of main query and CTE that's needed to allow execution of
different parts of the plan tree to overlap and thus create the risk.

(I get the impression that only recently have people been writing really
complex CTE queries, since we found another fundamental bug with them
just recently.)

I think probably the best fix is to rejigger things so that Params
assigned by different executions of SS_replace_correlation_vars and
createplan.c can't share PARAM_EXEC numbers.  This will result in
rather larger ecxt_param_exec_vals arrays at runtime, but the array
entries aren't very large, so I don't think it'll matter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2rc1 produces incorrect results

2012-09-04 Thread Tom Lane
I wrote:
 I think probably the best fix is to rejigger things so that Params
 assigned by different executions of SS_replace_correlation_vars and
 createplan.c can't share PARAM_EXEC numbers.  This will result in
 rather larger ecxt_param_exec_vals arrays at runtime, but the array
 entries aren't very large, so I don't think it'll matter.

Attached is a draft patch against HEAD for this.  I think it makes the
planner's handling of outer-level Params far less squishy than it's ever
been, but it is rather a large change.  Not sure whether to risk pushing
it into 9.2 right now, or wait till after we cut 9.2.0 ... thoughts?

regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1f2bb6cc72f1242f14d55eee7cdc8e0e0d0775a9..02a0f62a53a4e3d06a3ad48d523e959d5d6b2ab7 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outPlannerGlobal(StringInfo str, const 
*** 1666,1672 
  	WRITE_NODE_TYPE(PLANNERGLOBAL);
  
  	/* NB: this isn't a complete set of fields */
- 	WRITE_NODE_FIELD(paramlist);
  	WRITE_NODE_FIELD(subplans);
  	WRITE_BITMAPSET_FIELD(rewindPlanIDs);
  	WRITE_NODE_FIELD(finalrtable);
--- 1666,1671 
*** _outPlannerGlobal(StringInfo str, const 
*** 1674,1679 
--- 1673,1679 
  	WRITE_NODE_FIELD(resultRelations);
  	WRITE_NODE_FIELD(relationOids);
  	WRITE_NODE_FIELD(invalItems);
+ 	WRITE_INT_FIELD(nParamExec);
  	WRITE_UINT_FIELD(lastPHId);
  	WRITE_UINT_FIELD(lastRowMarkId);
  	WRITE_BOOL_FIELD(transientPlan);
*** _outPlannerInfo(StringInfo str, const Pl
*** 1688,1693 
--- 1688,1694 
  	WRITE_NODE_FIELD(parse);
  	WRITE_NODE_FIELD(glob);
  	WRITE_UINT_FIELD(query_level);
+ 	WRITE_NODE_FIELD(plan_params);
  	WRITE_BITMAPSET_FIELD(all_baserels);
  	WRITE_NODE_FIELD(join_rel_list);
  	WRITE_INT_FIELD(join_cur_level);
*** _outRelOptInfo(StringInfo str, const Rel
*** 1754,1759 
--- 1755,1761 
  	WRITE_FLOAT_FIELD(allvisfrac, %.6f);
  	WRITE_NODE_FIELD(subplan);
  	WRITE_NODE_FIELD(subroot);
+ 	WRITE_NODE_FIELD(subplan_params);
  	/* we don't try to print fdwroutine or fdw_private */
  	WRITE_NODE_FIELD(baserestrictinfo);
  	WRITE_NODE_FIELD(joininfo);
*** _outPlannerParamItem(StringInfo str, con
*** 1950,1956 
  	WRITE_NODE_TYPE(PLANNERPARAMITEM);
  
  	WRITE_NODE_FIELD(item);
! 	WRITE_UINT_FIELD(abslevel);
  }
  
  /*
--- 1952,1958 
  	WRITE_NODE_TYPE(PLANNERPARAMITEM);
  
  	WRITE_NODE_FIELD(item);
! 	WRITE_INT_FIELD(paramId);
  }
  
  /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 69a1b93b33746370457bff2daf4d4ece66535803..458dae0489c029bd743c75c82f8e5102067e89bf 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*** set_subquery_pathlist(PlannerInfo *root,
*** 1145,1150 
--- 1145,1153 
  	else
  		tuple_fraction = root-tuple_fraction;
  
+ 	/* plan_params should not be in use in current query level */
+ 	Assert(root-plan_params == NIL);
+ 
  	/* Generate the plan for the subquery */
  	rel-subplan = subquery_planner(root-glob, subquery,
  	root,
*** set_subquery_pathlist(PlannerInfo *root,
*** 1152,1157 
--- 1155,1164 
  	subroot);
  	rel-subroot = subroot;
  
+ 	/* Isolate the params needed by this specific subplan */
+ 	rel-subplan_params = root-plan_params;
+ 	root-plan_params = NIL;
+ 
  	/*
  	 * It's possible that constraint exclusion proved the subquery empty. If
  	 * so, it's convenient to turn it back into a dummy path so that we will
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5d3b293b88a3ee030adae2260520eda69caad4b7..030f420c90eb37946ee333250de54af61d9b82d7 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*** static HashJoin *create_hashjoin_plan(Pl
*** 84,90 
  	 Plan *outer_plan, Plan *inner_plan);
  static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
  static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
! static void identify_nestloop_extparams(PlannerInfo *root, Plan *subplan);
  static List *fix_indexqual_references(PlannerInfo *root, IndexPath *index_path);
  static List *fix_indexorderby_references(PlannerInfo *root, IndexPath *index_path);
  static Node *fix_indexqual_operand(Node *node, IndexOptInfo *index, int indexcol);
--- 84,91 
  	 Plan *outer_plan, Plan *inner_plan);
  static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
  static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
! static void process_subquery_nestloop_params(PlannerInfo *root,
!  List *subplan_params);