On Mon, Jun 1, 2026 at 1:35 PM Henson Choi <[email protected]> wrote:
>
> Hi Tatsuo, Jian,
>
> While tidying RPR comments I found a small inconsistency in the varId bounds.
> The comment/README side I'm already fixing in the in-progress series; whether
> to also change the bounds is a separate follow-up. As lead author that one is
> ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
> it first.
>
> The current state, in src/include/optimizer/rpr.h:
>
> #define RPR_VARID_MAX 251
> #define RPR_VARID_BEGIN 252 /* control codes 252..255 */
> ... END 253, ALT 254, FIN 255
>
> RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
>
> and the limit enforced in parse_rpr.c:
>
> if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
> ereport(ERROR, "too many pattern variables", "Maximum is 251");
>
> So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
> assigned, yet the macro still classifies it as a variable -- one wider than
> the comment's own "0 to RPR_VARID_MAX - 1".
>
> RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
> match-history key, and since a match can run arbitrarily long the history
> grows with it -- so one byte per row, not two, is what keeps that footprint
> in check.
>
> The catch of staying in uint8: the four control codes already fill 252..255,
> so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
> {- -}) short of widening to uint16. So the hole is really the last reserve.
>
> Three ways, by what the gap is spent on:
>
> (1) Leave it -- just the doc alignment already underway: 251 stays a
> documented
> reserve, macro unchanged. No follow-up commit. The one free slot is then
> on hand for a single future control code, should one ever be needed.
>
> (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a
> few
> lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
> last free slot, so a future control code would then force either a
> compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
> (doubling history). Maximal variables now, the control question deferred.
>
> (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
> vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
> (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so history
> stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
> only the nominal drop to 240 variables -- but it is a narrowing, so free
> only pre-release.
>
> Which would you prefer?
>
3.
240 variables is enough, as each variable supports multiple complex AND/OR
conditions. Additionally, since PostgreSQL regular expressions use 14 special
characters, reserving the remaining ones in advance is a future-proof approach.
----------------------------------------------------
src/backend/executor/README.rpr
XII-4. Memory Pool Management
Choice: Custom free list
Rationale:
- NFA states are created and destroyed in large numbers per row
- Avoids palloc/pfree overhead
- State size is variable (counts[] array), but within a single query
maxDepth is fixed, so all states have the same size
It would be better simply to mention that:
RPRNFAState and RPRNFAContext are allocated in a partition-lifespan
memory context; they will be destroyed in release_partition.
--------------------------
in ExecRPRFreeContext:
{
if (ctx->states != NULL)
nfa_state_free_list(winstate, ctx->states);
if (ctx->matchedState != NULL)
nfa_state_free(winstate, ctx->matchedState);
}
If ctx->matchedState points to one of the states already in ctx->states, will
nfa_state_free() be called on the same RPRNFAState twice? Is this double-free
permitted, or do we have a mechanism in place to guard against it?
--------------------------
In ExecRPRProcessRow
if (currentPos == ctxFrameEnd)
{
/* Frame boundary reached: force mismatch */
nfa_match(winstate, ctx, NULL);
continue;
}
If I comment out the CONTINUE, the entire regression still succeeds.
--
jian
https://www.enterprisedb.com/