Re: Refactor query normalization into core query jumbling
> On Mon, Apr 6, 2026 at 11:28 PM Michael Paquier wrote: > > > > On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote: > > > I took a look at v9 and it LGTM. > > > > I can also see that v9 had the idea to discard quite a few of the > > edits I did previously. Restored that, reworded one more place that > > was refering to query normalization in ComputeConstantLengths(), > > applied the result. We're in time at the end. > > Thanks for getting this in! Thanks Michael and Lukas! -- Sami
Re: Refactor query normalization into core query jumbling
On Mon, Apr 6, 2026 at 11:28 PM Michael Paquier wrote: > > On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote: > > I took a look at v9 and it LGTM. > > I can also see that v9 had the idea to discard quite a few of the > edits I did previously. Restored that, reworded one more place that > was refering to query normalization in ComputeConstantLengths(), > applied the result. We're in time at the end. Thanks for getting this in! (and apologies that I dropped your edits, I was looking at Sami's v7 version when I put that together) Thanks, Lukas -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
On Sun, Apr 05, 2026 at 05:13:40PM -0500, Sami Imseih wrote: > I took a look at v9 and it LGTM. I can also see that v9 had the idea to discard quite a few of the edits I did previously. Restored that, reworded one more place that was refering to query normalization in ComputeConstantLengths(), applied the result. We're in time at the end. -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
> >> > I'm +-0 regarding this routine, but I can also see your point about how > >> > it's > >> > useful to give at least the option to extensions to have a > >> > recomputation of the Const lengths, the same way as PGSS. What are > >> > the extensions that would use that? > >> > >> https://github.com/search?q=fill_in_constant_lengths&type=code > >> > >> A few well-known extensions/tools out there based on a Github search. > > FWIW, this search points to a lot of forks > > > See attached a v9 that extracts ComputeConstantLengths from Sami's v7 > > for easier discussion. > > > > I've done an updated test with pg_stat_monitor [0] and pg_tracing [1] > > with that, and that has both extensions passing regressions tests. > > I've also looked but not tested a third extension (sql_firewall) and > > that looks like it should just work as well. > > Still I can see the gains here, so I guess that this version works > here. I'm happy enough to see the const with JumbleState. I took a look at v9 and it LGTM. -- Sami
Re: Refactor query normalization into core query jumbling
On Wed, Apr 01, 2026 at 12:55:28AM -0700, Lukas Fittl wrote: > On Tue, Mar 31, 2026 at 7:54 PM Sami Imseih wrote: >> >> > I'm +-0 regarding this routine, but I can also see your point about how >> > it's >> > useful to give at least the option to extensions to have a >> > recomputation of the Const lengths, the same way as PGSS. What are >> > the extensions that would use that? >> >> https://github.com/search?q=fill_in_constant_lengths&type=code >> >> A few well-known extensions/tools out there based on a Github search. FWIW, this search points to a lot of forks > See attached a v9 that extracts ComputeConstantLengths from Sami's v7 > for easier discussion. > > I've done an updated test with pg_stat_monitor [0] and pg_tracing [1] > with that, and that has both extensions passing regressions tests. > I've also looked but not tested a third extension (sql_firewall) and > that looks like it should just work as well. Still I can see the gains here, so I guess that this version works here. I'm happy enough to see the const with JumbleState. -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
On Tue, Mar 31, 2026 at 7:54 PM Sami Imseih wrote: > > > I'm +-0 regarding this routine, but I can also see your point about how it's > > useful to give at least the option to extensions to have a > > recomputation of the Const lengths, the same way as PGSS. What are > > the extensions that would use that? > > https://github.com/search?q=fill_in_constant_lengths&type=code > > A few well-known extensions/tools out there based on a Github search. See attached a v9 that extracts ComputeConstantLengths from Sami's v7 for easier discussion. I've done an updated test with pg_stat_monitor [0] and pg_tracing [1] with that, and that has both extensions passing regressions tests. I've also looked but not tested a third extension (sql_firewall) and that looks like it should just work as well. Its worth noting that pg_tracing had a secondary use case for its fill_in_constant_lengths function, which is to find the first non-comment token in a query (to skip over leading comments). Not having that causes a small regression test difference. The maintainers will have to decide whether its worth keeping their own function, but I think that doesn't invalidate the utility of having this in core to me. Thanks, Lukas [0]: https://github.com/lfittl/pg_stat_monitor/commit/054f347344a380f7a59cb444685db71301669c61 [1]: https://github.com/lfittl/pg_tracing/commit/d48a36c9a2c8fc284cf63ec1161558b90d8b44ef -- Lukas Fittl v9-0001-Make-JumbleState-const-in-post_parse_analyze-hook.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
> >> I see where you're coming from on that, but I don't think we can > >> remove anything here in practice: > > > > Yes. Not unless we want to rely on the parser to track the lengths in > > Const, which could be invasive, but I have not looked into it. > > Hmm. We may not want to get down to that, still that would be > cheaper than reprocessing the parsing of the query string twice. Although unless an extension, at least pg_stat_statements, should not be doing this double work often, and if it is it is due to heavy churn on entries. > >> I still think it'd be reasonable for us to include > >> ComputeConstantLengths in core to complete the picture of what we're > >> doing with _jumbleElements and the length field already anyway. Its > >> basically a way to fully hydrate the partially filled out JumbleState > >> from the initial jumble. > > > > I fully agree, ComputeConstantLengths is an optional post-jumble-query step > > for a consumer that wishes to calculate the lengths. The length calculation > > is not unique to a plug-in, so in my mind the work it's doing is core > > jumbling functionality. > > Okay. I could fall into that for this release. Marking the > JumbleState as a const is the most important piece here. I do agree. > I'm +-0 regarding this routine, but I can also see your point about how it's > useful to give at least the option to extensions to have a > recomputation of the Const lengths, the same way as PGSS. What are > the extensions that would use that? https://github.com/search?q=fill_in_constant_lengths&type=code A few well-known extensions/tools out there based on a Github search. -- Sami
Re: Refactor query normalization into core query jumbling
On Mon, Mar 30, 2026 at 02:20:47PM -0500, Sami Imseih wrote: >> I see where you're coming from on that, but I don't think we can >> remove anything here in practice: > > Yes. Not unless we want to rely on the parser to track the lengths in > Const, which could be invasive, but I have not looked into it. Hmm. We may not want to get down to that, still that would be cheaper than reprocessing the parsing of the query string twice. >> I still think it'd be reasonable for us to include >> ComputeConstantLengths in core to complete the picture of what we're >> doing with _jumbleElements and the length field already anyway. Its >> basically a way to fully hydrate the partially filled out JumbleState >> from the initial jumble. > > I fully agree, ComputeConstantLengths is an optional post-jumble-query step > for a consumer that wishes to calculate the lengths. The length calculation > is not unique to a plug-in, so in my mind the work it's doing is core > jumbling functionality. Okay. I could fall into that for this release. Marking the JumbleState as a const is the most important piece here. I'm +-0 regarding this routine, but I can also see your point about how it's useful to give at least the option to extensions to have a recomputation of the Const lengths, the same way as PGSS. What are the extensions that would use that? -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
> > > I agree that ComputeConstantLengths should be in core. This one is > > > not a gray area IMO. The query jumble already records constant locations, > > > but leaves the lengths unset. ComputeConstantLengths is just the > > > completion of that work. There could be no other interpretation, > > > unlike generate_normalized_query, of what the lengths should be. > > > > This is an interesting remark. One problem with any patches presented yet > > is that we don’t attach the lengths as part of the in-core query jumbling > > procedure: we plug the lengths later using the same structure as the query > > jumbling. It seems to me that this is half-baked overall. I think that we > > don’t want to pay the extra length computation in the core query jumbling > > at the end, then why does it make sense to include the lengths in the > > JumbleState structure at all? Shouldn’t we use a different structure filled > > inside PGSS for this purpose rather than reuse the same thing for PGSS and > > the in-core part (don’t have the code in front of me now). > > I see where you're coming from on that, but I don't think we can > remove anything here in practice: Yes. Not unless we want to rely on the parser to track the lengths in Const, which could be invasive, but I have not looked into it. Paying the length computation at the end of every jumble is also going to impact performance, so that will not be a good idea. > I still think it'd be reasonable for us to include > ComputeConstantLengths in core to complete the picture of what we're > doing with _jumbleElements and the length field already anyway. Its > basically a way to fully hydrate the partially filled out JumbleState > from the initial jumble. I fully agree, ComputeConstantLengths is an optional post-jumble-query step for a consumer that wishes to calculate the lengths. The length calculation is not unique to a plug-in, so in my mind the work it's doing is core jumbling functionality. -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
On Fri, Mar 27, 2026 at 6:42 PM Michael Paquier wrote:
>
> > On Mar 28, 2026, at 2:09, Sami Imseih wrote:
> > I agree that ComputeConstantLengths should be in core. This one is
> > not a gray area IMO. The query jumble already records constant locations,
> > but leaves the lengths unset. ComputeConstantLengths is just the
> > completion of that work. There could be no other interpretation,
> > unlike generate_normalized_query, of what the lengths should be.
>
> This is an interesting remark. One problem with any patches presented yet is
> that we don’t attach the lengths as part of the in-core query jumbling
> procedure: we plug the lengths later using the same structure as the query
> jumbling. It seems to me that this is half-baked overall. I think that we
> don’t want to pay the extra length computation in the core query jumbling at
> the end, then why does it make sense to include the lengths in the
> JumbleState structure at all? Shouldn’t we use a different structure filled
> inside PGSS for this purpose rather than reuse the same thing for PGSS and
> the in-core part (don’t have the code in front of me now).
I see where you're coming from on that, but I don't think we can
remove anything here in practice:
typedef struct LocationLen
{
intlocation; /* Required */
intlength; /* Set by _jumbleElements */
boolsquashed; /* Set by RecordConstLocation called from
_jumbleElements */
boolextern_param; /* Set by RecordConstLocation called
from _jumbleParam */
} LocationLen;
typedef struct JumbleState
{
unsigned char *jumble; /* Required */
Sizejumble_len; /* Required */
LocationLen *clocations; /* Required to track constant locations */
intclocations_buf_size; /* Required to track constant
locations */
intclocations_count; /* Required to track constant locations */
inthighest_extern_param_id; /* Set by _jumbleParam */
boolhas_squashed_lists; /* Set by _jumbleElements */
unsigned int pending_nulls; /* Required */
Sizetotal_jumble_len; /* Required */
} JumbleState;
The only refactoring I could think of is to write out the
_jumbleElements information separately, then we could actually drop
length, and maybe squashed, from LocationLen. But I'm not sure that
really buys us much? It would be more clear I guess, because the mixed
locations of where length gets set is indeed a bit odd.
I still think it'd be reasonable for us to include
ComputeConstantLengths in core to complete the picture of what we're
doing with _jumbleElements and the length field already anyway. Its
basically a way to fully hydrate the partially filled out JumbleState
from the initial jumble.
Thanks,
Lukas
--
Lukas Fittl
Re: Refactor query normalization into core query jumbling
> On Mar 28, 2026, at 2:09, Sami Imseih wrote: > I agree that ComputeConstantLengths should be in core. This one is > not a gray area IMO. The query jumble already records constant locations, > but leaves the lengths unset. ComputeConstantLengths is just the > completion of that work. There could be no other interpretation, > unlike generate_normalized_query, of what the lengths should be. This is an interesting remark. One problem with any patches presented yet is that we don’t attach the lengths as part of the in-core query jumbling procedure: we plug the lengths later using the same structure as the query jumbling. It seems to me that this is half-baked overall. I think that we don’t want to pay the extra length computation in the core query jumbling at the end, then why does it make sense to include the lengths in the JumbleState structure at all? Shouldn’t we use a different structure filled inside PGSS for this purpose rather than reuse the same thing for PGSS and the in-core part (don’t have the code in front of me now). -- Michael
Re: Refactor query normalization into core query jumbling
> On Thu, Mar 26, 2026 at 10:18 PM Michael Paquier wrote: > > This line of arguments is stronger for the normalization of the query > > string. Why should the core code decide what a normalized string > > should look like when it comes to the detection of the constants, if > > any? Instead of a dollar-quoted number, we could enforce a bunch of > > things, like a '?' or a '$woozah$' at these locations. > > Fair enough, though I haven't seen any extensions that do that in > practice - its reasonable to have normalization result in a query > string that's parsable again and can be passed to EXPLAIN > (GENERIC_PLAN). with regards to generate_normalized_query, AFAICT, the most common case is extensions are using it for is dollar quoted number, but I agree this one is a gray area. > What if we only put the ComputeConstantLengths (as Sami had it in v7) > in core, together with making JumbleState const? I agree that ComputeConstantLengths should be in core. This one is not a gray area IMO. The query jumble already records constant locations, but leaves the lengths unset. ComputeConstantLengths is just the completion of that work. There could be no other interpretation, unlike generate_normalized_query, of what the lengths should be. -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
On Thu, Mar 26, 2026 at 10:18 PM Michael Paquier wrote: > This line of arguments is stronger for the normalization of the query > string. Why should the core code decide what a normalized string > should look like when it comes to the detection of the constants, if > any? Instead of a dollar-quoted number, we could enforce a bunch of > things, like a '?' or a '$woozah$' at these locations. Fair enough, though I haven't seen any extensions that do that in practice - its reasonable to have normalization result in a query string that's parsable again and can be passed to EXPLAIN (GENERIC_PLAN). > Saying that, the key point of the patch is to take a copy of the > JumbleState locations, and recompute it for the needs of PGSS for the > sake of the normalized query representation. Hence, why don't we just > do that at the end? That should be enough to enforce the const marker > for the JumbleState across all the loaded extensions that want to look > at it. This leads me to the simpler patch attached. > > Comments or tomatoes? That's simpler, and I'd be OK with just that > for v19. That would be much better than the current state of affairs, > where PGSS decides to enforce its own ideas in the JumbleState, ideas > fed to anything looping into the post-parse-analyze hook after PGSS. I'm not convinced that making the const change alone is a good idea, without also providing some helpers to reduce the repeated code in extensions. What if we only put the ComputeConstantLengths (as Sami had it in v7) in core, together with making JumbleState const? Thanks, Lukas -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
On Thu, Mar 26, 2026 at 06:50:20PM -0700, Lukas Fittl wrote: > On Thu, Mar 26, 2026 at 10:19 AM Sami Imseih wrote: > > Both of those changes belong in 0002. See the attached v7. > > Looks good! > > I've also done a quick follow-up test with pg_tracing and that > simplifies its logic as expected [0] to be able to extract inline > parameter values. I have been looking at what you have here. As mentioned upthread, I am on board with aiming for JumbleState to be a const so as we enforce as policy that no extension is allowed to touch what the core code has computed. However, I am not convinced by most of the patch. The logic to recompute the lengths of the constants is a concept proper to PGSS. Perhaps we could reconsider moving that into core at some point, but I am honestly not sure to see the range of benefits we'd gain from that. This line of arguments is stronger for the normalization of the query string. Why should the core code decide what a normalized string should look like when it comes to the detection of the constants, if any? Instead of a dollar-quoted number, we could enforce a bunch of things, like a '?' or a '$woozah$' at these locations. Saying that, the key point of the patch is to take a copy of the JumbleState locations, and recompute it for the needs of PGSS for the sake of the normalized query representation. Hence, why don't we just do that at the end? That should be enough to enforce the const marker for the JumbleState across all the loaded extensions that want to look at it. This leads me to the simpler patch attached. Comments or tomatoes? That's simpler, and I'd be OK with just that for v19. That would be much better than the current state of affairs, where PGSS decides to enforce its own ideas in the JumbleState, ideas fed to anything looping into the post-parse-analyze hook after PGSS. -- Michael From 336e6aa38d6dc3901c3fa1f746796a0b8fc87cd2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 27 Mar 2026 14:13:00 +0900 Subject: [PATCH v8] Make JumbleState const in post_parse_analyze hook Change the post_parse_analyze_hook_type signature to take a const JumbleState parameter, preventing hooks from modifying the jumble state during query analysis. This improves API safety by making it clear that hooks should only read from the jumble state, not modify it. Update pg_stat_statements and related functions to match the new const signature. Refactor fill_in_constant_lengths() to return a newly allocated LocationLen array instead of modifying the JumbleState, so as PGSS does not touch the now-const JumbleState. The routine is renamed to compute_constant_lengths(). Discussion: https://postgr.es/m/caa5rz0tzp5qu0ikzeeqjnxvdsngh1dwv80sb-k4qaumimoo...@mail.gmail.com --- src/include/parser/analyze.h | 2 +- .../pg_stat_statements/pg_stat_statements.c | 75 --- 2 files changed, 50 insertions(+), 27 deletions(-) diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h index e10270ff0ffd..b2db6fa4e8c4 100644 --- a/src/include/parser/analyze.h +++ b/src/include/parser/analyze.h @@ -21,7 +21,7 @@ /* Hook for plugins to get control at end of parse analysis */ typedef void (*post_parse_analyze_hook_type) (ParseState *pstate, Query *query, - JumbleState *jstate); + const JumbleState *jstate); extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook; diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 6cb14824ec3b..0b6be5e255a4 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -335,7 +335,7 @@ static void pgss_shmem_request(void); static void pgss_shmem_startup(void); static void pgss_shmem_shutdown(int code, Datum arg); static void pgss_post_parse_analyze(ParseState *pstate, Query *query, - JumbleState *jstate); + const JumbleState *jstate); static PlannedStmt *pgss_planner(Query *parse, const char *query_string, int cursorOptions, @@ -359,7 +359,7 @@ static void pgss_store(const char *query, int64 queryId, const BufferUsage *bufusage, const WalUsage *walusage, const struct JitInstrumentation *jitusage, - JumbleState *jstate, +
Re: Refactor query normalization into core query jumbling
On Thu, Mar 26, 2026 at 10:19 AM Sami Imseih wrote: > Both of those changes belong in 0002. See the attached v7. Looks good! I've also done a quick follow-up test with pg_tracing and that simplifies its logic as expected [0] to be able to extract inline parameter values. Thanks, Lukas [0]: https://github.com/lfittl/pg_tracing/commit/ae937fdee4aa57206b31b5746f36dd8e55cf43d1#diff-41cf816684c420b8808fb4899e38a38aaf4f875613c4785e291010aa2b0ea267 -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
> 1) I think "SetConstantLengths" should be renamed to > "GetConstantLengths", or "CalculateConstantLengths" in 0002, since > we're no longer modifying JumbleState Makes sense. I renamed it to ComputeConstantLengths; as this will reflect that it's more than a getter function, and it's doing actual work. Compute also seems more consistent with other function names we have out there. > 2) I wonder if we should export this function in 0002 - that would > specifically help pg_tracing, which also wants to extract inline > parameter values in addition to replacing them with a $n parameter > reference marker - I could also see that being useful for any other > extension that's interested in pulling out parameter values > I've also done some testing with two previously mentioned extensions, > pg_stat_monitor [0] and pg_tracing [1]. Both look like they can be > adapted to the new method with their regression tests succeeding. That did cross my mind. I agree with this. Both of those changes belong in 0002. See the attached v7. -- Sami Imseih Amazon Web Services (AWS) v7-0001-pg_stat_statements-Move-query-normalization-to-co.patch Description: Binary data v7-0002-Make-JumbleState-const-in-post_parse_analyze-hook.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
Hi Sami, On Wed, Mar 25, 2026 at 8:31 PM Sami Imseih wrote: > > v6 addresses the comments. > Great - I've tested that again and changes look good. Two more thoughts: 1) I think "SetConstantLengths" should be renamed to "GetConstantLengths", or "CalculateConstantLengths" in 0002, since we're no longer modifying JumbleState 2) I wonder if we should export this function in 0002 - that would specifically help pg_tracing, which also wants to extract inline parameter values in addition to replacing them with a $n parameter reference marker - I could also see that being useful for any other extension that's interested in pulling out parameter values I've also done some testing with two previously mentioned extensions, pg_stat_monitor [0] and pg_tracing [1]. Both look like they can be adapted to the new method with their regression tests succeeding. Thanks, Lukas [0]: https://github.com/lfittl/pg_stat_monitor/commit/d3521de9f6736c1bb745bc31dae736540efe1781 [1]: https://github.com/lfittl/pg_tracing/commit/ecff95a87a1c60e8a5fe47662cc6a2148bd3632f -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
Thanks for looking! > I don't think this is needed anymore, as of > 45762084545ec14dbbe66ace1d69d7e89f8978ac. Correct. That showed up after my last rebase. > > +/* > > + * Callback to generate a normalized version of the query string that will > > be used to > > + * represent all similar queries. > > + * > > I don't think the term "Callback" makes sense here - I think you could > just keep the original wording. This was a remnant of my earlier experimentation. I removed. A few notes on the comments: - * Note that the normalized representation may well vary depending on - * just which "equivalent" query is used to create the hashtable entry. - * We assume this is OK. This was in the original generate_normalize_query and since it mentions hashtable entry, I moved the comment (in-spirit) to where pg_stat_statements calls GenerateNormailzeQuery * If query_loc > 0, then "query" has been advanced by that much compared to * the original string start, as is the case with multi-statement strings, so * we need to translate the provided locations to compensate. (This lets us * avoid re-scanning statements before the one of interest, so it's worth doing.) * This comment was originally duplicated in both SetConstantLengths, so I just kept it as-is in SetConstantLengths and added a shorter reference in GenerateNormalizeQuery Also, this comment "It is the caller's job to ensure that the string is a valid SQL statement..." made more sense in GenerateNormalizeQuery rather than SetConstantLengths, since GenerateNormalizeQuery is the public function. > In 0002: > You could use palloc_array for locs here. done. > I think we should update the comment here to reflect the fact that > we're no longer modifying JumbleState. done. > Otherwise these patches look good - it'd be nice to still get this > into 19 so we have less code duplication across the different > extensions working with normalized query text. I agree! v6 addresses the comments. -- Sami Imseih Amazon Web Services (AWS) v6-0002-Make-JumbleState-const-in-post_parse_analyze-hook.patch Description: Binary data v6-0001-pg_stat_statements-Move-query-normalization-to-co.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
Hi Sami, On Mon, Mar 16, 2026 at 7:12 PM Sami Imseih wrote: > > > Here is v4. > > > > 0001 - addresses the comments made by Bertrand. > > 0002 - makes JumbleState a constant when passed to post_parse_analyze > > and updates all downstream code that consume the JumbleState. This > > means we now need to copy the locations from JState into a local/temporary > > array when generating the normalized string. In 0001: > diff --git a/src/backend/nodes/queryjumblefuncs.c > b/src/backend/nodes/queryjumblefuncs.c > > index 87db8dc1a32..d4b26202c47 100644 > --- a/src/backend/nodes/queryjumblefuncs.c > +++ b/src/backend/nodes/queryjumblefuncs.c > ... > @@ -773,3 +775,249 @@ _jumbleRangeTblEntry_eref(JumbleState *jstate, > + > +/* we don't want to re-emit any escape string warnings */ > +yyextra.escape_string_warning = false; > + I don't think this is needed anymore, as of 45762084545ec14dbbe66ace1d69d7e89f8978ac. > +/* > + * Callback to generate a normalized version of the query string that will > be used to > + * represent all similar queries. > + * I don't think the term "Callback" makes sense here - I think you could just keep the original wording. In 0002: > diff --git a/src/backend/nodes/queryjumblefuncs.c > b/src/backend/nodes/queryjumblefuncs.c > index d4b26202c47..fe8f0ccd278 100644 > --- a/src/backend/nodes/queryjumblefuncs.c > +++ b/src/backend/nodes/queryjumblefuncs.c > ... > @@ -813,14 +815,20 @@ SetConstantLengths(JumbleState *jstate, const char > *query, > core_YYSTYPE yylval; > YYLTYPEyylloc; > > +if (jstate->clocations_count == 0) > +return NULL; > + > +/* Copy constant locations to avoid modifying jstate */ > +locs = palloc(jstate->clocations_count * sizeof(LocationLen)); > +memcpy(locs, jstate->clocations, jstate->clocations_count * > sizeof(LocationLen)); > + You could use palloc_array for locs here. > @@ -938,12 +948,13 @@ GenerateNormalizedQuery(JumbleState *jstate, const char > *query, > last_off = 0,/* Offset from start for previous tok */ > last_tok_len = 0;/* Length (in bytes) of that tok */ > intnum_constants_replaced = 0; > +LocationLen *locs = NULL; > > /* > * Set constants' lengths in JumbleState, as only locations are set during > * DoJumble(). Note this also ensures the items are sorted by location. > */ > -SetConstantLengths(jstate, query, query_loc); > +locs = SetConstantLengths(jstate, query, query_loc); I think we should update the comment here to reflect the fact that we're no longer modifying JumbleState. Otherwise these patches look good - it'd be nice to still get this into 19 so we have less code duplication across the different extensions working with normalized query text. Thanks, Lukas -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
> Here is v4. > > 0001 - addresses the comments made by Bertrand. > 0002 - makes JumbleState a constant when passed to post_parse_analyze > and updates all downstream code that consume the JumbleState. This > means we now need to copy the locations from JState into a local/temporary > array when generating the normalized string. PFA a mandatory rebase. -- Sami Imseih Amazon Web Services (AWS) v5-0002-Make-JumbleState-const-in-post_parse_analyze-hook.patch Description: Binary data v5-0001-pg_stat_statements-Move-query-normalization-to-co.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
On Tue, Dec 30, 2025 at 01:31:30PM +0900, Michael Paquier wrote: > Saying that, yes I think that you are right, we should be able to > remove this dependency on JumbleState in > pg_hint_plan_post_parse_analyze() and rely on the query ID. I'm > writing down a note about cleaning that on the HEAD branch of the > module.. For the note, I have looked at that again and this improvement is right, so I have pushed something to simplify the code: https://github.com/ossc-db/pg_hint_plan/commit/3a63a56740e48e1d205294d5df6e198716718882 Thanks for the suggestion, Lukas. -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
> >> backwards compatible. Basically, we keep JumbleState a non-constant, > >> but provide core APIs for any operation, such as > >> generate_normalized_query, > >> that needs to modify the state. So, my approach was not about enforcing a > >> read-only JumbleState, but about providing the API to dissuade an author > >> from modifying a JumbleState. > > > Given the lack of public APIs to modify JumbleState today, I don't see how > > an extension would do > > modifications in a meaningful way, short of copying the code. I think we > > should be a bit bolder here in > > enforcing a convention, either clearly making it read-only or dropping the > > argument again. > > Based on the discussion so far I am leaning towards making JumbleState > read-only as described here [0]. I don't see how we can drop JumbleState > completely from hooks, since normalization needs to occur on-demand > by the extension. Here is v4. 0001 - addresses the comments made by Bertrand. 0002 - makes JumbleState a constant when passed to post_parse_analyze and updates all downstream code that consume the JumbleState. This means we now need to copy the locations from JState into a local/temporary array when generating the normalized string. -- Sami Imseih Amazon Web Services (AWS) v4-0002-Make-JumbleState-const-in-post_parse_analyze-hook.patch Description: Binary data v4-0001-pg_stat_statements-Move-query-normalization-to-co.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
>> backwards compatible. Basically, we keep JumbleState a non-constant, >> but provide core APIs for any operation, such as >> generate_normalized_query, >> that needs to modify the state. So, my approach was not about enforcing a >> read-only JumbleState, but about providing the API to dissuade an author >> from modifying a JumbleState. > Given the lack of public APIs to modify JumbleState today, I don't see how > an extension would do > modifications in a meaningful way, short of copying the code. I think we > should be a bit bolder here in > enforcing a convention, either clearly making it read-only or dropping the > argument again. Based on the discussion so far I am leaning towards making JumbleState read-only as described here [0]. I don't see how we can drop JumbleState completely from hooks, since normalization needs to occur on-demand by the extension. > 1) An extension that wants to display normalized query strings > > This seems to be the biggest kind of what I can find with code search. > Extensions like pg_stat_monitor [1], that > want to do something like pg_stat_statements, and have to copy a bunch of > normalization code today that is 1:1 what > Postgres does. Such extensions don't need the JumbleState argument if they > can get the normalized text directly. Yes, I don't know how that's possible; besides generating the normalized string during JumbleQuery and making it available to post_parse_analyze hook ( and other executor hooks ). But this also means we are incurring the normalization overhead for every execution. > 2) An extension that wants to capture parameter values > > Some extensions may want to remember additional context for normalized > queries, like pg_tracing's logic for > optionally passing parameter values (post normalization) in the trace > context [2]. If we kept passing a read-only > JumbleState then such extensions could presumably still get this, but I > wonder if it wouldn't be better for core to > have a helper for this? This could be like a core GenerateNormalizedQuery which can optionally track the constant values. That will be an enhancement to normalization and a new requirement. [0] https://www.postgresql.org/message-id/CAA5RZ0sbWmqdUBFo8JXMJe72pnwjxVY58htJ6pKbwnyQuRctQw%40mail.gmail.com -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
> === 1 > > + SetConstantLengths((JumbleState *) jstate, query, query_loc); > > This cast seems unnecessary. Right. This is unnecessary. I will remove in the next iteration. > === 2 > > +CompLocation(const void *a, const void *b) > > In the commit message I can see "Functions are renamed to match core naming > conventions" but wasn't comp_location() better? Not sure if it's better or worse. I aimed for consistency here and used pascal case to match. > === 3 > > + /* > +* generate the normalized query. Note that the > normalized > +* representation may well vary depending on just > which > +* "equivalent" query is used to create the hashtable > entry. We > +* assume this is OK. > +*/ > + norm_query = GenerateNormalizedQuery(jstate, query, > > Should part of this comment be on top of the GenerateNormalizedQuery() > definition instead? No, because this talks about the hashtable entry which is specific to pg_stat_statements. -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
Hi, On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote: > v3 implements this approach without a callback. This establishes a clear > boundary: core owns JumbleState modifications, extensions consume the > results through the API. > Thanks for the new patch version. Some random comments: === 1 + SetConstantLengths((JumbleState *) jstate, query, query_loc); This cast seems unnecessary. === 2 +CompLocation(const void *a, const void *b) In the commit message I can see "Functions are renamed to match core naming conventions" but wasn't comp_location() better? === 3 + /* +* generate the normalized query. Note that the normalized +* representation may well vary depending on just which +* "equivalent" query is used to create the hashtable entry. We +* assume this is OK. +*/ + norm_query = GenerateNormalizedQuery(jstate, query, Should part of this comment be on top of the GenerateNormalizedQuery() definition instead? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Refactor query normalization into core query jumbling
On Mon, Dec 29, 2025 at 06:34:18PM -0800, Lukas Fittl wrote: > I noticed pg_hint_plan checks the JumbleState argument to determine whether > or not to run an early check of the hint logic. I'm a little > confused by this (and if this is about query IDs being available, > couldn't it just check the Query having a non-zero queryID?), but > FWIW: [3] See also the code before 32ced2e13e75. We wanted a JumbleState in this case to be able to generate a normalized query for the hint table. Since this commit we only rely on the query ID in the hint table, so it would not matter for pg_hint_plan if the JumbleState is removed from this portions of the hooks. Saying that, yes I think that you are right, we should be able to remove this dependency on JumbleState in pg_hint_plan_post_parse_analyze() and rely on the query ID. I'm writing down a note about cleaning that on the HEAD branch of the module.. -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
On Wed, Dec 24, 2025 at 9:33 AM Sami Imseih wrote: > > that would be pushed across more stacks of the > > post-parse-analyze hook. Doing that would allow us to pull the plug > > into making JumbleState and the original copies of clocations a set of > > consts when given to extensions. > > Yes correct. The hook signature will turn into, so all extensions now > just have a constant pointer to the jumblestate. They can of course > work hard to cast away constants, but at that point, they are doing > things the wrong way. > > ``` > typedef void (*post_parse_analyze_hook_type) (ParseState *pstate, > > Query *query, > > const JumbleState *jstate); > ``` > > This of course will be a big breaking change to all the extensions out > there using this hook. This is not just about a signature change of > the hook, but an author now has to figure out how to deal with a > constant JumbleState in their code, which may not be trivial. > I wonder if there is a single extension out there that actually utilizes JumbleState in that hook - it was added in 5fd9dfa5f50e4906c35133a414ebec5b6d518493 presumably because pg_stat_statements needed it, but even Julien at the time was critical of adding it, mainly considering it for pgss needs if I read the archive correctly [0]. CCing Julien to correct me if I misinterpret the historic record here. Reading through the discussion in this thread here I do wonder a bit if we shouldn't just take that out of the hook again, and instead provide separate functions for getting the normalized query string (generating it the first time its called). My proposal in v3 was a bit more softer, and keeps the hook > backwards compatible. Basically, we keep JumbleState a non-constant, > but provide core APIs for any operation, such as > generate_normalized_query, > that needs to modify the state. So, my approach was not about enforcing a > read-only JumbleState, but about providing the API to dissuade an author > from modifying a JumbleState. Given the lack of public APIs to modify JumbleState today, I don't see how an extension would do modifications in a meaningful way, short of copying the code. I think we should be a bit bolder here in enforcing a convention, either clearly making it read-only or dropping the argument again. Thinking through a couple of theoretical use cases: 1) An extension that wants to display normalized query strings This seems to be the biggest kind of what I can find with code search. Extensions like pg_stat_monitor [1], that want to do something like pg_stat_statements, and have to copy a bunch of normalization code today that is 1:1 what Postgres does. Such extensions don't need the JumbleState argument if they can get the normalized text directly. 2) An extension that wants to capture parameter values Some extensions may want to remember additional context for normalized queries, like pg_tracing's logic for optionally passing parameter values (post normalization) in the trace context [2]. If we kept passing a read-only JumbleState then such extensions could presumably still get this, but I wonder if it wouldn't be better for core to have a helper for this? 3) An extension that modifies the JumbleState to cause different normalization to happen I'm not aware of any extensions doing this, and I couldn't find any either. And since such theoretical extensions can directly modify query->queryId in the same hook, why not do that? 4) Extensions using the presence of JumbleState to determine whether query IDs are available (?) I noticed pg_hint_plan checks the JumbleState argument to determine whether or not to run an early check of the hint logic. I'm a little confused by this (and if this is about query IDs being available, couldn't it just check the Query having a non-zero queryID?), but FWIW: [3] [0]: https://www.postgresql.org/message-id/flat/CAOBaU_ZbeQrUESCuLGk3sRZWxXFGaBBO39CxSsFxLeZGicUrJw%40mail.gmail.com#9a9bc47aa1b4b25ee2e3de1388c4c346 [1]: https://github.com/percona/pg_stat_monitor/blob/main/pg_stat_monitor.c#L476 [2]: https://github.com/DataDog/pg_tracing/blob/main/src/pg_tracing_query_process.c#L428 [3]: https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3111 Thanks, Lukas -- Lukas Fittl
Re: Refactor query normalization into core query jumbling
> > For the second part of making more jumbling utilities global, this can > > be taken up in another thread. I am now thinking we would be better > > off actually taking all the generic jumbling routines and separating > > them into their own C file. So we can have a new file like primjumble.c > > which will have all the "primitive" jumbling utilities, and > > queryjumblefuncs.c > > can worry about its core business of jumbling a Query tree. Anyhow, > > these are just some high level thoughts on this part for now. > > Hmm. Moving from pgss to the core backup the code that updates the > clocations of an existing JumbleState does not solve the issue that > we should not modify the internals of the JumbleState computed. So > this does not move the ball at all. This is for the point "I am wondering if we should not expose a bit more the jumble query APIs" [0], which as I mention is a separate topic and is not about extensions being able to modify JumbleState. > The updates of clocations depend on GenerateNormalizedQuery(), which > depends on the pre-work done in CleanQuerytext() to get a query string > and its location for a multi-query string case. If we really want to > make a JumbleState not touchable at all, we could either push more > work of CleanQuerytext() and GenerateNormalizedQuery() into core. When you say “push more work into core,” I understand that to mean this work should occur during jumbling. If so, there are several complications. 1/ CleanQueryText() requires access to the query text, which we do not have during core jumbling as of 2ecbb0a4935. 2/ GenerateNormalizedQuery() should be invoked on demand by the extension that needs the normalized string, for example when pg_stat_statements needs to store a query string. Both operations are expensive, especially GenerateNormalizedQuery(). Doing this work on every JumbleQuery() would introduce unnecessary overhead. > Or a second thing that we can do, which would be actually much > simpler, is to rework entirely fill_in_constant_lengths() so as we > generate a *copy* of the location data PGSS wants rather than force it > into a JumbleState yes, that's an idea that did cross my mind. I have hesitation of copying clocations, but that may not be such a big deal, since it will only be occurring on demand when the extension requests a normalized string. > that would be pushed across more stacks of the > post-parse-analyze hook. Doing that would allow us to pull the plug > into making JumbleState and the original copies of clocations a set of > consts when given to extensions. Yes correct. The hook signature will turn into, so all extensions now just have a constant pointer to the jumblestate. They can of course work hard to cast away constants, but at that point, they are doing things the wrong way. ``` typedef void (*post_parse_analyze_hook_type) (ParseState *pstate, Query *query, const JumbleState *jstate); ``` This of course will be a big breaking change to all the extensions out there using this hook. This is not just about a signature change of the hook, but an author now has to figure out how to deal with a constant JumbleState in their code, which may not be trivial. My proposal in v3 was a bit more softer, and keeps the hook backwards compatible. Basically, we keep JumbleState a non-constant, but provide core APIs for any operation, such as generate_normalized_query, that needs to modify the state. So, my approach was not about enforcing a read-only JumbleState, but about providing the API to dissuade an author from modifying a JumbleState. If we need a stronger API contract, then we can go with the hook receving JumbleState as a constant and implement the copy of clocations to return back to extensions that need to normalize a query. We also have to keep in mind that if there are future requirements where the state has to be modified by an extension, we will need to implement more copy functions for those field. [0] https://www.postgresql.org/message-id/aUXeTEpSymo6n7lF%40paquier.xyz -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote:
> > Though this may be tangential to the current topic, I have long been
> > wanting to revise the two instances of `Assert(len_to_wrt >= 0);`
> > in the code to the implementation below. Would you kindly advise if this
> > modification is worthwhile?
> >
> > ```
> > if (len_to_wrt > 0)
> > {
> > memcpy(norm_query + n_quer_loc, query + quer_loc,
> > len_to_wrt);
> > n_quer_loc += len_to_wrt;
> > }
> > ```
>
> I am not sure I like this idea. The len_to_wrt being less than 0
> indicates a bug which will be hidden behind such a loop. I
> think it's better to keep the Assert as-is.
This set of asserts are critical to keep. An incorrect computation is
a synonym of an out-of-bound write in this case, which would classify
any problem as something that could be worth a CVE.
> v3 implements this approach without a callback. This establishes a clear
> boundary: core owns JumbleState modifications, extensions consume the
> results through the API.
>
> I kept the post_parse_analyze hook signature unchanged since
> GenerateNormalizedQuery still needs to modify JumbleState
> (to populate constant lengths). Making it const would be misleading.
>
> For the second part of making more jumbling utilities global, this can
> be taken up in another thread. I am now thinking we would be better
> off actually taking all the generic jumbling routines and separating
> them into their own C file. So we can have a new file like primjumble.c
> which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
> can worry about its core business of jumbling a Query tree. Anyhow,
> these are just some high level thoughts on this part for now.
Hmm. Moving from pgss to the core backup the code that updates the
clocations of an existing JumbleState does not solve the issue that
we should not modify the internals of the JumbleState computed. So
this does not move the ball at all.
The updates of clocations depend on GenerateNormalizedQuery(), which
depends on the pre-work done in CleanQuerytext() to get a query string
and its location for a multi-query string case. If we really want to
make a JumbleState not touchable at all, we could either push more
work of CleanQuerytext() and GenerateNormalizedQuery() into core.
Or a second thing that we can do, which would be actually much
simpler, is to rework entirely fill_in_constant_lengths() so as we
generate a *copy* of the location data PGSS wants rather than force it
into a JumbleState that would be pushed across more stacks of the
post-parse-analyze hook. Doing that would allow us to pull the plug
into making JumbleState and the original copies of clocations a set of
consts when given to extensions.
--
Michael
signature.asc
Description: PGP signature
Re: Refactor query normalization into core query jumbling
On Wed, Dec 24, 2025 at 10:50:16AM +0800, zengman wrote: > Besides, it seems you accidentally forgot to cc Michael. This is not a problem. I still get the messages by being registered to this list :D -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
> I am not sure I like this idea. The len_to_wrt being less than 0 > indicates a bug which will be hidden behind such a loop. I > think it's better to keep the Assert as-is. Well, my main concern is that assertions are unlikely to be enabled in production environments. Also, the new patch looks really good. Besides, it seems you accidentally forgot to cc Michael. -- Regards, Man Zeng www.openhalo.org
Re: Refactor query normalization into core query jumbling
> Though this may be tangential to the current topic, I have long been wanting
> to revise the two instances of `Assert(len_to_wrt >= 0);`
> in the code to the implementation below. Would you kindly advise if this
> modification is worthwhile?
>
> ```
> if (len_to_wrt > 0)
> {
> memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
> n_quer_loc += len_to_wrt;
> }
> ```
I am not sure I like this idea. The len_to_wrt being less than 0
indicates a bug which will be hidden behind such a loop. I
think it's better to keep the Assert as-is.
> > > This way, any extension that wishes to return a normalized string from
> > > the same JumbleState can invoke this callback and get consistent results.
> > > pg_stat_statements and other extensions with a need to normalize a query
> > > string based on the locations of a JumbleState do not need to care about
> > > the
> > > internals of normalization, they simply invoke the callback and
> > > receive the final
> > > string.
> >
> > Hmm. I did not wrap completely my head with your problem, but,
> > assuming that what you are proposing goes in the right direction,
>
> The first goal is to move all query-normalization-related infrastructure
> that pg_stat_statements (and other extensions) rely on into core, so
> extensions no longer need to copy or reimplement normalization logic and
> can all depend on a single, shared implementation.
>
> In addition, query normalization necessarily modifies JumbleState (to
> record constant locations and lengths). This responsibility should not
> fall to extensions and should instead be delegated to core. I will argue
> that the current design, in which extensions handle this directly, is a
> layering violation.
>
> As a first step, we can move generate_normalized_query to core as a global
> function, allowing extensions to simply call it.
v3 implements this approach without a callback. This establishes a clear
boundary: core owns JumbleState modifications, extensions consume the
results through the API.
I kept the post_parse_analyze hook signature unchanged since
GenerateNormalizedQuery still needs to modify JumbleState
(to populate constant lengths). Making it const would be misleading.
For the second part of making more jumbling utilities global, this can
be taken up in another thread. I am now thinking we would be better
off actually taking all the generic jumbling routines and separating
them into their own C file. So we can have a new file like primjumble.c
which will have all the "primitive" jumbling utilities, and queryjumblefuncs.c
can worry about its core business of jumbling a Query tree. Anyhow,
these are just some high level thoughts on this part for now.
--
Sami Imseih
Amazon Web Services (AWS)
v3-0001-pg_stat_statements-Move-query-normalization-to-co.patch
Description: Binary data
Re: Refactor query normalization into core query jumbling
Hi,
Though this may be tangential to the current topic, I have long been wanting to
revise the two instances of `Assert(len_to_wrt >= 0);`
in the code to the implementation below. Would you kindly advise if this
modification is worthwhile?
```
if (len_to_wrt > 0)
{
memcpy(norm_query + n_quer_loc, query + quer_loc, len_to_wrt);
n_quer_loc += len_to_wrt;
}
```
--
Regards,
Man Zeng
www.openhalo.org
Re: Refactor query normalization into core query jumbling
> > This way, any extension that wishes to return a normalized string from > > the same JumbleState can invoke this callback and get consistent results. > > pg_stat_statements and other extensions with a need to normalize a query > > string based on the locations of a JumbleState do not need to care about the > > internals of normalization, they simply invoke the callback and > > receive the final > > string. > > Hmm. I did not wrap completely my head with your problem, but, > assuming that what you are proposing goes in the right direction, The first goal is to move all query-normalization-related infrastructure that pg_stat_statements (and other extensions) rely on into core, so extensions no longer need to copy or reimplement normalization logic and can all depend on a single, shared implementation. In addition, query normalization necessarily modifies JumbleState (to record constant locations and lengths). This responsibility should not fall to extensions and should instead be delegated to core. I will argue that the current design, in which extensions handle this directly, is a layering violation. As a first step, we can move generate_normalized_query to core as a global function, allowing extensions to simply call it. > I am wondering if we should not expose a bit more the jumble query APIs so > as the normal default callback can be reused by out-of-core rather > than hide it entirely. This would mean exposing > GenerateNormalizedQuery(), which also giving a way for callers of > JumbleQuery() to pass down a custom callback? This would imply > thinking harder about the initialization state we expect in the > structure, but I think that we should try to design things so as > extensions do not need to copy-paste more code from the core tree at > the end, just less of it. ... and this will be taking the next step which is providing callbacks and making more jumbling utilities global. This will require more discussion, but I would think we would expose InitJumble() and it will do the bare minimum to initialize a JumbleState, and some fields that can define callbacks after the fact. There will be a callback for a normalization function and a callback function that will allow the user to implement jumbling functions for nodes that are currently not included in queryjumblefuncs.switch.c, or perhaps they can override the existing logic in this generated file. > Of course, this sentence is written with the same line of thoughts as > previously mentioned in the other thread we have discussed: extensions > should not be allowed to update a JumbleState after it's been set by > the backend code, so as once the same JumbleState pointer is passed > down across multiple extensions they don't get confused. If an > extension wants to use their own policy within the JumbleState, they > had better recreate a new independent one if they are unhappy about > has been generated previously. Yes, correct. If we provide the interface to create an additional JumbleState, they can create an independent state. For this thread, I would like to focus on the first goal. What do you think? -- Sami Imseih Amazon Web Services (AWS)
Re: Refactor query normalization into core query jumbling
On Fri, Dec 19, 2025 at 03:36:16PM -0600, Sami Imseih wrote: > This way, any extension that wishes to return a normalized string from > the same JumbleState can invoke this callback and get consistent results. > pg_stat_statements and other extensions with a need to normalize a query > string based on the locations of a JumbleState do not need to care about the > internals of normalization, they simply invoke the callback and > receive the final > string. Hmm. I did not wrap completely my head with your problem, but, assuming that what you are proposing goes in the right direction, I am wondering if we should not expose a bit more the jumble query APIs so as the normal default callback can be reused by out-of-core rather than hide it entirely. This would mean exposing GenerateNormalizedQuery(), which also giving a way for callers of JumbleQuery() to pass down a custom callback? This would imply thinking harder about the initialization state we expect in the structure, but I think that we should try to design things so as extensions do not need to copy-paste more code from the core tree at the end, just less of it. Of course, this sentence is written with the same line of thoughts as previously mentioned in the other thread we have discussed: extensions should not be allowed to update a JumbleState after it's been set by the backend code, so as once the same JumbleState pointer is passed down across multiple extensions they don't get confused. If an extension wants to use their own policy within the JumbleState, they had better recreate a new independent one if they are unhappy about has been generated previously. -- Michael signature.asc Description: PGP signature
Re: Refactor query normalization into core query jumbling
> While this is technically correct so the compiler does not complain (because > clocations is a non const pointer in JumbleState and the added const does not > apply to what clocations points to), I think that adding const here is > misleading. Yes, I am not happy with this. I initially thought it would be OK to modify the JumbleState as long as it is done in a core function, but after much thought, this is neither a good idea nor safe. If a pointer is marked as a Constant, we should not modify it. So, I went back to think about this, and the core problem as I see it is that multiple hooks on the chain can modify the constant lengths. For example, pg_stat_statements can now modify the constant lengths in JumbleState via fill_in_constant_lengths, and the same JumbleState can have its constant locations modified in a different way. At this time, constant lengths are the only part of the JumbleState that is not set by core. So, I don't think post_parse_analyze receiving JumbleState as a constant is the right approach, nor do we need it. I think we should allow JumbleState to define a normalization callback, which defaults to a core normalization function rather than an extension specific one: ``` jstate->normalize_query = GenerateNormalizedQuery; ``` This way, any extension that wishes to return a normalized string from the same JumbleState can invoke this callback and get consistent results. pg_stat_statements and other extensions with a need to normalize a query string based on the locations of a JumbleState do not need to care about the internals of normalization, they simply invoke the callback and receive the final string. So v2-0001 implements this callback and moves the normalization logic into core. Both changes must be done in the same patch. The comments are also updated where they are no longer applicable or could be improved. One additional improvement that this patch did not include is a bool in JumbleState that tracks whether a normalized string has already been generated. This way, repeated calls to the callback would not need to regenerate the string; only the first call would perform the work, while subsequent calls could simply return the previously computed normalized string. I do like the simplicity of this approach and it removes pg_stat_statements from having to own the normalization code and how well different extensions sharing the same JumbleState can play together. Not yet sure if this is the correct direction, and I am open to other ideas. -- Sami Imseih Amazon Web Services (AWS) v2-0001-pg_stat_statements-enforce-single-query-normaliza.patch Description: Binary data
Re: Refactor query normalization into core query jumbling
Hi, On Thu, Dec 18, 2025 at 06:44:17PM -0600, Sami Imseih wrote: > For the second point, since JumbleState can be shared by multiple > extensions, hooks should receive it as a const pointer. This > signals read-only intent and prevents extensions from > accidentally modifying it through the hook. This change is in > 0002. This does mean that extensions will need to update their > hooks, but I do not see that as an issue for a major version. I can see that 0001 is doing: -static void fill_in_constant_lengths(JumbleState *jstate, const char *query, and then: +static void +fill_in_constant_lengths(const JumbleState *jstate, const char *query, Should the const addition be in 0002? But... While looking at fill_in_constant_lengths(), I can see that it is doing: locs = jstate->clocations; and then things like: locs[i].length = -1 or locs[i].length = strlen(yyextra.scanbuf + loc) While this is technically correct so the compiler does not complain (because clocations is a non const pointer in JumbleState and the added const does not apply to what clocations points to), I think that adding const here is misleading. Indeed, the function clearly modifies data accessible through the parameter. Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
