Re: Multiple Query IDs for a rewritten parse tree

Tom Lane Sat, 08 Jan 2022 16:50:19 -0800

Andrey Lepikhov <a.lepik...@postgrespro.ru> writes:
> On 5/1/2022 10:13, Tom Lane wrote:
>>> I feel like we need to get away from the idea that there is just
>>> one query hash, and somehow let different extensions attach
>>> differently-calculated hashes to a query.  I don't have any immediate
>>> ideas about how to do that in a reasonably inexpensive way.


> Thinking for a while I invented three different ways to implement it:
> 1. queryId will be a trivial 64-bit counter.

This seems pretty useless.  The whole point of the query hash, at
least for many use-cases, is to allow recognizing queries that are
the same or similar.

> 2. Instead of simple queryId we can store a list of pairs (QueryId, 
> funcOid). An extension can register a callback for queryId generation 
> and the core will form a list of queryIds right after an query tree 
> rewriting. funcOid is needed to differ jumbling implementations. Here we 
> should invent an additional node type for an element of the list.

I'm not sure that funcOid is a reasonable way to tag different hash
calculation methods, because it isn't necessarily stable across
installations.  For the same reason, it'd be hard for two extensions
to collaborate on a common query-hash definition.

> 3. Instead of queryId we could add a multi-purpose 'private' list in the 
> Query struct. Any extension can add to this list additional object(s) 
> (with registered object type, of course). As an example, i can imagine a 
> kind of convention for queryIds in such case - store a String node with 
> value: '<extension name> - <Query ID>'.

Again, this is presuming that every extension is totally independent
and has no interest in what any other code is doing.  But I don't
think we want to make every extension that wants a hash duplicate
the whole of queryjumble.c.

The idea I'd been vaguely thinking about is to allow attaching a list
of query-hash nodes to a Query, where each node would contain a "tag"
identifying the specific hash calculation method, and also the value
of the query's hash calculated according to that method.  We could
probably get away with saying that all such hash values must be uint64.
The main difference from your function-OID idea, I think, is that
I'm envisioning the tags as being small integers with well-known
values, similarly to the way we manage stakind values in pg_statistic.
In this way, an extension that wants a hash that the core knows how
to calculate doesn't need its own copy of the code, and similarly
one extension could publish a calculation method for use by other
extensions.

We'd also need some mechanism for registering a function to be
used to calculate the hash for any given tag value, of course.

                        regards, tom lane

Re: Multiple Query IDs for a rewritten parse tree

Reply via email to