(2018/02/13 21:51), Ashutosh Bapat wrote:
Here's my analysis of the bug.
The node for which this error comes is a ConvertRowtypeExpr node with
Var::varattno = 0 under it. Whole row reference of the parent is converted to
ConvertRowtypeExpr with whole row of child as an argument. When partition-wise
join is used, targetlist of child-joins contain such ConvertRowtypeExpr when
the parent-join's targetlist has whole-row references of joining
When we deparse the targetlist of join pushed down by postgres FDW,
build_tlist_to_deparse() pulls only Var nodes nodes from the join's targetlist.
So it pulls Var reprensenting a whole-row reference of a child from a
ConvertRowtypeExpr, when building targetlist to be deparsed for a child-join
with whole-row references. This targetlist is then saved as fdw_scan_tlist in
This causes two problems shown by the two queries below
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1 WHERE c1 = 50) t1 INNER
JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM pt1 WHERE c1 between 50 and
60) t2 FULL JOIN (SELECT c1 FROM pt2 WHERE c1 between 50 and 60) t3 ON
(t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE)
ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
ERROR: unexpected expression in subquery output
get_relation_column_alias_ids() uses foreignrel's targetlist
(foreignrel->reltarget->exprs) as it is to locate given node to be deparsed.
If the joining relation corresponding to ConvertRowtypeExpr is deparsed as a
subquery, this function is called with whole-row reference node (Var node with
varattno = 0). But the relation's targetlist doesn't contain its whole-row
reference directly but has it embedded in ConvertRowtypeExpr. So, the function
doesn't find the given node and throws error.
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM pt1) t1 INNER JOIN (SELECT
t2.c1, t3.c1 FROM (SELECT c1 FROM pt1) t2 FULL JOIN (SELECT c1 FROM pt1) t3
ON (t2.c1 = t3.c1)) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE
ERROR: variable not found in subplan target lists
When there is possibility of EvalPlanQual being called, we construct local
join plan matching the pushed down foreign join. In postgresGetForeignPlan()
after we have built the local join plan, the topmost plan node's targetlist is
changed to fdw_scan_tlist to match the output of the ForeignScan node. As
explained above, this targetlist contains a bare reference to whole-row
reference of a child relation if the child-join's targetlist contains a
ConvertRowtypeExpr. When changing the topmost plan node's targetlist, we do not
modify the targetlists of its left and right tree nodes. The left/right plan
involving corresponding child relation will have ConvertRowtypeExpr expression
in its targetlist, but not whole-row reference directly. When the topmost local
join plan node's targetlist is processed by set_plan_ref(), it throws error
"variable not found in subplan target lists" since it doesn't find bare
whole-row reference of the child relation in subplan's targetlists.
Thanks for the analysis!
The problem can be solved in two ways:
1. Push down ConvertRowtypeExpr and include it in the pushed down targetlist.
This would solve both the problems described above. Both set_plan_ref() and
get_relation_column_alias_ids() will find ConvertRowtypeExpr, they are looking
for and won't throw an error.
This requires two parts
a. In build_tlist_to_deparse(), instead of pulling
Var node from ConvertRowtypeExpr, we pull whole ConvertRowtypeExpr and include
it in the targetlist being deparsed which is also used as to set
fdw_scan_tlist. In order to pull any ConvertRowtypeExpr's in the local quals,
which may be hidden in the expression tree, we will add two more options to
flags viz. PVC_INCLUDE_CONVERTROWTYPEEXPR and PVC_RECURSE_CONVERTROWTYPEEXPR.
Unlike the other PVC_* options, which do not default to a value, we may want to
default to PVC_RECURSE_CONVERTROWTYPEEXPR if nothing is specified. That will
avoid, possibly updating every pull_var_clause call with
b. deparse ConvertRowtypeExpr
For this we need to get the conversion map between the parent and child. We
then deparse ConvertRowtypeExpr as a ROW() with the attributes of child
rearranged per the conversion map. A multi-level partitioned table will have
nested ConvertRowtypeExpr. To deparse such expressions, we need to find the
conversion map between the topmost parent and the child, by ignoring any
2. Modify the local join plan entirely to contain whole row reference of
child relations instead of ConvertRowtypeExpr and deparse a ConvertRowtypeExpr
as a whole-row reference in a subquery.
Again we need two part solution:
a. For this we need to write a walker which walks the plan tree distributing the
Vars in the topmost targetlist to the left and right subtrees. Thus we replace
ConvertRowtypeExpr with corresponding whole-row references in the whole plan
b. When get_relation_column_alias_ids() encounters a
ConvertRowtypeExpr, it pulls
out the embedded whole row reference and returns the corresponding column id.
deparseExpr() calls deparseVar() by pulling out the embedded whole-row
reference Var when it encouters ConvertRowtypeExpr.
I'd vote for #1, but ISTM that that's more like a feature, not a fix.
Pushing down ConvertRowtypeExprs to the remote seems to me to be the
same problem as pushing down PHVs to the remote, which is definitely a
feature development. I think a fix for this would be to just give up
pushing down a child join in foreign_join_ok or somewhere else if the
reltarget of the child-join relation contains any ConvertRowtypeExprs.