Hi Henson,

> Hi Tatsuo, Jian,
> 
> While doing a self-review pass over the incremental fixes on top of v47 I
> ran into two issues where I'd rather agree on an approach with you before
> I pick one.  One of them is a regression I introduced myself in the DEFINE
> memory-leak fix; the other is an original design point from v47.  There is
> also a third bug which I plan to handle together with the second one, since
> it can be affected by that change -- I describe it at the end.
> 
> I have verified both of the issues below on an assert-enabled build (and a
> non-assert build where relevant).
> 
> 
> == 1. DEFINE evaluation reuses the per-output-tuple context
> (use-after-free) ==
> 
> nocfbot-0039 (the DEFINE memory-leak fix) added a ResetExprContext() in
> update_reduced_frame, but it resets the wrong context.
> 
> ps_ExprContext is the per-output-tuple context that ExecWindowAgg resets
> once per output row.  update_reduced_frame now resets it once per NFA row,
> while the output row is still being formed -- so a pass-by-ref window
> function result already datum-copied into that per-tuple memory (when
> numfuncs > 1) is freed before ExecProject reads it.
> 
> Minimal trigger -- a pass-by-ref window function plus a second one over an
> RPR window:
> 
>   SELECT lag(company) OVER w, count(*) OVER w FROM stock
>    WINDOW w AS (PARTITION BY company ORDER BY tdate
>                 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>                 AFTER MATCH SKIP PAST LAST ROW INITIAL
>                 PATTERN (START UP+)
>                 DEFINE START AS TRUE, UP AS price > PREV(price));
> 
> On a CLOBBER_FREED_MEMORY build the lag column comes out as 0x7F garbage;
> in production it is garbage or a crash.  (An aggregate is not required --
> lag + first_value hits the same reset via the frame-access path.)
> 
> Neither v47 nor the patch is the answer on its own: v47 had no reset here,
> so no use-after-free, but the DEFINE scratch accumulated over the whole
> forward scan (the leak nocfbot-0039 fixed); nocfbot-0039 added the per-row
> reset but on the shared per-output-tuple context.  We do want a per-row
> reset -- just not on that context.
> 
> So I think this needs a dedicated ExprContext for DEFINE evaluation, reset
> once per NFA row: it keeps the memory bounded without touching the
> per-output-tuple results.
> 
> Question: does a dedicated DEFINE ExprContext look right to you?

Can we use winstate->tmpcontext instead?
> 
> == 2. PREV/NEXT/FIRST/LAST placeholders collide with user functions ==
> 
> The nav operations are polymorphic pg_catalog functions (anyelement, OIDs
> 8126-8133) recognized by funcid in parse_func.c, which collides with
> same-name user functions.
> 
> Outside DEFINE, a same-name function masks or clashes with the placeholder:
> with public.last(anyelement), SELECT last(123) fails "cannot use last
> outside a DEFINE clause"; with public.next(numeric), SELECT next(10) fails
> "function next(integer) is not unique"; and even with no user function,
> last(123) errors instead of "function last(integer) does not exist".
> 
> Inside DEFINE, a same-name function with an exact-type match beats the
> anyelement placeholder, so PREV(price) silently becomes a plain FuncExpr
> instead of an RPRNavExpr -- a wrong match result with no error (reproduced
> for numeric, text and int).  And ruleutils deparses a bare PREV(, so
> reparsing a view under a search_path with public.prev rebinds it (pg_dump
> is safe via search_path = '').
> 
> This is original v47 design, not a regression.  Per the standard,
> PREV/NEXT/FIRST/LAST are navigation operations with dedicated syntax, not
> general-namespace functions -- the collision comes from mapping them onto
> catalog functions plus search-path resolution.
> 
> I haven't found a clean approach yet.  Inside DEFINE these names have to be
> the navigation operation (per the standard), yet outside DEFINE they
> shouldn't shadow or break same-name user functions the way the catalog
> placeholders do -- and since the deparse output is unqualified (a bare
> PREV(...)), whatever we choose also has to round-trip cleanly.  I'm not
> sure how best to reconcile those.
> 
> My rough leaning is to not add catalog functions for these at all: leave
> resolution outside DEFINE exactly as it is today, and only inside DEFINE
> adjust the function-resolution path itself to recognize the navigation
> operations.  But that is still quite abstract.
> 
> Question: how would you approach this?

I need more time think about this.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Reply via email to