Re: [HACKERS] Changed SRF in targetlist handling

Andres Freund Mon, 01 Aug 2016 01:24:45 -0700

On 2016-05-25 16:55:23 -0400, Tom Lane wrote:
> Andres Freund <[email protected]> writes:
> > On 2016-05-25 15:20:03 -0400, Tom Lane wrote:
> >> We could certainly make a variant behavior in nodeFunctionscan.c that
> >> emulates that, if we feel that being exactly bug-compatible on the point
> >> is actually what we want.  I'm dubious about that though, not least
> >> because I don't think *anyone* actually believes that that behavior isn't
> >> broken.  Did you read my upthread message suggesting assorted compromise
> >> choices?
>
> > You mean 
> > https://www.postgresql.org/message-id/[email protected] ?
> > If so, yes.
>
> > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by
> > option 1), that'd keep most of the functionality, and would break
> > visibly rather than invisibly in the cases where not.
>
> 2.5 would be okay with me.
>
> > I guess you're not planning to work on this?
>
> Well, not right now, as it's clearly too late for 9.6.  I might hack on
> it later if nobody beats me to it.


FWIW, as it's blocking my plans for executor related rework (expression
evaluation, batch processing) I started to hack on this.

I've an implementation that

1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
   expressions. If there's tSRFs in the argument of a tSRF those becomes
   a separate, lateral, ROWS FROM expression.

2) If grouping/window functions are present, the entire query is wrapped
   in a subquery RTE, except for the set-returning function. All
   referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
   original targetlist are made to reference that subquery, which gets a
   TargetEntry for them.

3) If sortClause does *not* reference any tSRFs the sorting is evaluated
   in a subquery, to preserve the output ordering of SRFs in queries
   like
   SELECT id, generate_series(1,3) FROM (VALUES(1),(2)) d(id) ORDER BY id DESC;
   if in contrast sortClause does reference the tSRF output, it's
   evaluated in the outer SRF.

this seems to generally work, and allows to remove considerable amounts
of code.

So far I have one problem without an easy solution: Historically queries
like
=# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
┌────┬─────────────────┐
│ id │ generate_series │
├────┼─────────────────┤
│  1 │               1 │
│  1 │               2 │
│  2 │               1 │
│  2 │               2 │
└────┴─────────────────┘
have preserved the SRF output ordering. But by turning the SRF into a
ROWS FROM, there's no guarantee that the cross join between "few" and
generate_series(1,3) above is implemented in that order. I.e. we can get
something like
┌────┬─────────────────┐
│ id │ generate_series │
├────┼─────────────────┤
│  1 │               1 │
│  2 │               1 │
│  1 │               2 │
│  2 │               2 │
└────┴─────────────────┘
because it's implemented as
┌──────────────────────────────────────────────────────────────────────────────┐
│                                  QUERY PLAN                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop  (cost=0.00..35.03 rows=2000 width=8)                            │
│   ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4) │
│   ->  Materialize  (cost=0.00..0.04 rows=2 width=4)                          │
│         ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)      │
└──────────────────────────────────────────────────────────────────────────────┘

I right now see no easy and nice-ish way to constrain that.


Besides that I'm structurally wondering whether turning the original
query into a subquery is the right thing to do. It requires some kind of
ugly munching of Query->*, and has the above problem. One alternative
would be to instead perform the necessary magic in grouping_planner(),
by "manually" adding nestloop joins before/after create_ordered_paths()
(depending on SRFs being referenced in the sort clause).  That'd create
plans we'd not have created so far, by layering NestLoop and
FunctionScan nodes above the normal query - that'd allow us to to easily
force the ordering of SRF evaluation.


If we go the subquery route, I'm wondering about where to tackle the
restructuring. So far I'm doing it very early in subquery_planner() -
otherwise the aggregation/sorting/... behaviour is easier to handle.
Perhaps doing it in standard_planner() itself would be better though.
An alternative approach would be to do this during parse-analysis, but I
think that might end up being confusing, because stored rules would
suddenly have a noticeably different structure, and it'd tie us a lot
more to the details of that transformation than I'd like.


Besides the removal of the least-common-multiple behaviour of tSRF queries,
there's some other consequences that using function scans have:
Previously if a tSRF was never evaluated, it didn't cause the number of
rows from being increased. E.g.
SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id);
only produced two rows.  But using joins means that a simple
implementation of using ROWS FROM returns four rows.   We could try to
inject sufficient join conditions in that type of query, to prune down
the number of rows again, but I really don't want to go there - it's
kinda hard in the general case...


Comments?


Regards,


Andres


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changed SRF in targetlist handling

Reply via email to