Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests 
failed with my patch. No idea if anyone spotted that and withheld reply in 
revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, 
though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the 
case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), 
rngfunc_mat(10+r,13) );
  r | i  | s | i  | s 
 ---+----+---+----+—
  1 | 11 | 1 | 11 | 1
  1 |    |   | 12 | 2
  1 |    |   | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
  2 |    |   | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
 (6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes 
s as nextval(). The patch currently prefers to re-execute the function in place 
of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way 
of avoiding change for folk that have gotten used to the current behaviour 
without doing that. So I’m tempted to pipeline the rows back from a function 
(if it returns ValuePerCall), and also record it in a tuplestore, just in case 
rescan happens. There’s still wastage in this approach, but it would save the 
current behaviour, while stil enabling the early abort of ValuePerCall SRFs at 
relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I 
might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the 
tuplestore.

Earlier, I suggested that we might make the decision logic prefer to 
materialise a tuplestore for VOLATILE functions, and prefer to pipeline 
directly from STABLE (and IMMUTABLE) functions. The docs on volatility 
categories describe that the optimiser will evaluate a VOLATILE function for 
every row it is needed, whereas it might cache STABLE and IMMUTABLE with 
greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make 
the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et 
al., is the right thing to do. I wonder if we could invent a new modifier to 
CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's 
ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] 
MATERIALIZED clause. For example, the following would achieve the above 
proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED ) 

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit 
materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on 
what the outer plan includes. I guess we can tell if we’re part of a join, or 
if the plan requires the ability to scan backwards. Could that work?

denty.

Reply via email to