Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

Andres Freund Wed, 12 Apr 2017 15:59:38 -0700

On 2017-04-05 09:39:37 -0400, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > On 2017-04-05 02:47:55 -0400, Noah Misch wrote:
> >> [Action required within three days.  This is a generic notification.]
>
> > I've a very preliminary patch.  I'd like to only start polishing it up
> > once the code freeze is over, so I can work on getting some patches in -
> > note that I myself have no pending patches.  Once frozen I'll polish it
> > up and send that within a few days.
>
> FWIW, I'm willing to help out on this.


Help would be appreciated.  I've pondered, and partially implemented,
several approaches so far, and my conclusion is that we should just do
nothing.  The amount of corner cases is just too big, and the utility of
the feature too small.

To recap, the issue is that in previous versions it was, by accident
(there's no test, code comments "supporting" the feature talk about a
different corner case, and the behaviour isn't correct in some cases)
allowed to do something like:
SELECT * FROM CAST(generate_series(1,3) * 5 AS int);
while
SELECT * FROM generate_series(1,3) * 5;
is not allowed.

The reason that that works from the gram.y perspective is that CAST etc
types of func_expr's.  The reason that it worked from a code perspective
is that execQual.c:ExecMakeTableFunctionResult() has the following:
        /*
         * Normally the passed expression tree will be a FuncExprState, since 
the
         * grammar only allows a function call at the top level of a table
         * function reference.  However, if the function doesn't return set then
         * the planner might have replaced the function call via 
constant-folding
         * or inlining.  So if we see any other kind of expression node, execute
         * it via the general ExecEvalExpr() code; the only difference is that 
we
         * don't get a chance to pass a special ReturnSetInfo to any functions
         * buried in the expression.
         */
        if (funcexpr && IsA(funcexpr, FuncExprState) &&
                IsA(funcexpr->expr, FuncExpr))
and back then ExecEvalExpr() was allowed to return sets.  It also
depends on some default initializations (e.g. rsinfo.returnMode =
SFRM_ValuePerCall).

This yields plenty weird behaviour in < v10. E.g. the following is
disallowed:
  SELECT * FROM int4mul(generate_series(1,3), 1);
  ERROR:  0A000: set-valued function called in context that cannot accept a set
as is
  SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS bigint);
  ERROR:  0A000: set-valued function called in context that cannot accept a set
because the cast is implemented as int8(expr) which avoids the fallback
path as it's a FuncExpr, but
  SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS text);
works because the cast is implemented as a io coercion, which is not a
funcexpr. Therefore it uses the fallback ExecEvalExpr().

The mismatch between ExecEvalExpr() and nodeFunctionscan.c SRF behaviour
back then also yields odd behaviour, e.g.:
  SELECT * FROM generate_series(1,0);
returns zero rows, but
  SELECT * FROM CAST(generate_series(1,0) * 5 AS INT);
returns one NULL row.


In v10, as it stands, these all error out, because the SRFs are now only
to be evaluated via either nodeFunctionscan.c (FROM) or via
nodeProjectSet.c (targetlist), not ExecEvalExpr() anymore.


I've basically pondered three different methods of implementing
something akin to the old behaviour:

1) Move the non-SRF part into nodeFunctionscan.c's targetlist, and let
   it be evaluated there.  E.g. if the expression is
   CAST(generate_series(1,5) AS text), evaluate the generate_series(1,5)
   using nodeFunctionscan's FunctionScanPerFuncState machinery, but
   implement the CAST as CAST(Var(whatever, 1) AS Text).

   That doesn't work well for composite/record returning rows, because
   RangeTblFunction's returning composites are expanded into multiple
   columns. E.g.
   SELECT * FROM CAST(CAST(twocol_noset_outline(generate_series(1,3), 'text') 
AS text) AS twocol);
   returns all the columns of twocol, not a single wholerow datum.

   There's also some issues around what to do with cases involving
   volatile functions when the output is not referenced, or referenced
   multiple times e.g.
     SELECT f, f FROM CAST(generate_series(1,3) * nextval(...)) AS f;
   would evaluate nextval() twice with such an approach...

2) During planning, split of the SRF bit from the !SRF bit and have
   nodeFunctionscan.c evaluate both separately, like 1).  That allows to
   avoid the volatility issue, because the targetlist is still projected
   separately.

   I've prototyped this to a reasonable point, by having
   create_functionscan_plan() process each RangeTblFunction and split
   the expression into SRF and non-SRF parts, and then have
   FunctionNext() evaluate the non-SRF part.

   At the current state of my prototype this happens to allow simple SRF
   in arguments cases like SELECT * FROM int4mul(generate_series(1,3), 1)
   which are disallowed in < v10.


   This is reasonably-ish in complexity for scalar values, but gets
   quite complicated for composite/record datums.  Suddenly there's two
   different types of values that we need to deal with, the type of
   datum returned by the SRF, and the type of datum returned by the
   final expression - they might be the same, they might not.
   Especially for record-returning functions, where ROWS FROM( AS...)
   determines the column types, and ExecMakeTableFunctionResult() does a
   tupledesc_match() to verify the SRF returned something reasonable,
   it gets wild: What would we even be matching the types against?


3) Implement nested FROM SRFs as pipelined executor nodes, similar to
   the way targetlist SRFs are now handled.  E.g. something like
   SELECT * FROM CAST(generate_series(1,10) * 5 AS int);

   would be implemented as one nodeFunctionscan.c for the
   generate_series(), and then something like nodeProjectSet.c (or a
   nodeFunctionscan.c branch that'd make a sub-node available) would
   evaluate the CAST(Var() * 5 AS int).

   This approach has the advantage that it'd allow us, potentially in
   the future, to get rid of the restriction that normal ROWS FROM
   doesn't allow SRFs in arguments.

   I think this again runs into trouble with row-expansion in the return
   value from SRFs, which isn't actually desired till the outermost
   "level" of the expression.  I guess we could add a mode to
   nodeFunctionscan.c that forces composite/record types to be returned
   as wholerow vars instead of being split up.


My conclusion here is that it's way too complicated to properly
implement a feature that only seems to exist by accident and has plenty
of weird behaviour.  Currently we continue to accept all the !SRF
expressions that were previously accepted, but I'd even consider
insisting that the expression needs to be a proper FuncExpr at
parse-analysis time (before inlining/const evaluation did its thing).

Unless somebody has a radically better idea how to implement this?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

Reply via email to