> While this is technically correct so the compiler does not complain (because
> clocations is a non const pointer in JumbleState and the added const does not
> apply to what clocations points to), I think that adding const here is 
> misleading.

Yes, I am not happy with this. I initially thought it would be OK to
modify the JumbleState as long as it is done in a core function, but
after much thought, this is neither a good idea nor safe. If a pointer
is marked as a Constant, we should not modify it.

So, I went back to think about this, and the core problem as I see it
is that multiple hooks on the chain can modify the constant lengths. For
example, pg_stat_statements can now modify the constant lengths in
JumbleState via fill_in_constant_lengths, and the same JumbleState can
have its constant locations modified in a different way.

At this time, constant lengths are the only part of the JumbleState that
is not set by core. So, I don't think post_parse_analyze receiving
JumbleState as a constant is the right approach, nor do we need it.

I think we should allow JumbleState to define a normalization callback,
which defaults to a core normalization function rather than an
extension specific one:

```
    jstate->normalize_query = GenerateNormalizedQuery;
```

This way, any extension that wishes to return a normalized string from
the same JumbleState can invoke this callback and get consistent results.
pg_stat_statements and other extensions with a need to normalize a query
string based on the locations of a JumbleState do not need to care about the
internals of normalization, they simply invoke the callback and
receive the final
string.

So v2-0001 implements this callback and moves the normalization logic
into core. Both changes must be done in the same patch.  The comments
are also updated where they are no longer applicable or could be improved.

One additional improvement that this patch did not include is a bool in
JumbleState that tracks whether a normalized string has already been
generated. This way, repeated calls to the callback would not need to
regenerate the string; only the first call would perform the work,
while subsequent calls could simply return the previously computed
normalized string.

I do like the simplicity of this approach and it removes pg_stat_statements
from having to own the normalization code and how well different extensions
sharing the same JumbleState can play together. Not yet sure if this is the
correct direction, and I am open to other ideas.

--
Sami Imseih
Amazon Web Services (AWS)

Attachment: v2-0001-pg_stat_statements-enforce-single-query-normaliza.patch
Description: Binary data

Reply via email to