On Wed, 7 Dec 2022 at 13:27, Michael Paquier <mich...@paquier.xyz> wrote: > > Hi all, > > This thread is a follow-up of the recent discussion about query > jumbling with DDL statements, where the conclusion was that we'd want > to generate all this code automatically for all the nodes: > https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e...@amazon.com > > What this patch allows to do it to compute the same query ID for > utility statements using their parsed Node state instead of their > string, meaning that things like "BEGIN", "bEGIN" or "begin" would be > treated the same, for example. But the main idea is not only that. > > I have implemented that as of the attached, where the following things > are done: > - queryjumble.c is moved to src/backend/nodes/, to stick with the > other things for node equal/read/write/copy, renamed to > jumblefuncs.c. > - gen_node_support.c is extended to generate the functions and the > switch for the jumbling. There are a few exceptions, as of the Lists > and RangeTblEntry to do the jumbling consistently. > - Two pg_node_attr() are added in consistency with the existing ones: > no_jumble to discard completely a node from the the query jumbling > and jumble_ignore to discard one field from the jumble. > > The patch is in a rather good shape, passes check-world and the CI, > but there are a few things that need to be discussed IMO. Things > could be perhaps divided in more patches, now the areas touched are > quite different so it did not look like a big deal to me as the > changes touch different areas and are straight-forward. > > The location of the Nodes is quite invasive because we only care about > that for T_Const now in the query jumbling, and this could be > compensated with a third pg_node_attr() that tracks for the "int > location" of a Node whether it should participate in the jumbling or > not. There is also an argument where we would want to not include by > default new fields added to a Node, but that would require added more > pg_node_attr() than what's done here. > > Note that the plan is to extend the normalization to some other parts > of the Nodes, like CALL and SET as mentioned on the other thread. I > have done nothing about that yet but doing so can be done in a few > lines with the facility presented here (aka just add a location > field). Hence, the normalization is consistent with the existing > queryjumble.c for the fields and the nodes processed. > > In this patch, things are done so as the query ID is not computed > anymore from the query string but from the Query. I still need to > study the performance impact of that with short queries. If things > prove to be noticeable in some cases, this stuff could be switched to > use a new GUC where we could have a code path for the computation of > utilityStmt using its string as a fallback. I am not sure that I want > to enter in this level of complications, though, to keep things > simple, but that's yet to be done. > > A bit more could be cut but pg_ident got in the way.. There are also > a few things for pg_stat_statements where a query ID of 0 can be > implied for utility statements in some cases. > > Generating this code leads to an overall removal of code as what > queryjumble.c is generated automatically: > 13 files changed, 901 insertions(+), 1113 deletions(-) > > I am adding that to the next commit fest.
The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID b82557ecc2ebbf649142740a1c5ce8d19089f620 === === applying patch ./0001-Support-for-automated-query-jumble-with-all-Nodes.patch ... patching file src/backend/utils/misc/queryjumble.c Hunk #1 FAILED at 1. Not deleting file src/backend/utils/misc/queryjumble.c as content differs from patch 1 out of 1 hunk FAILED -- saving rejects to file src/backend/utils/misc/queryjumble.c.rej [1] - http://cfbot.cputube.org/patch_41_4047.log Regards, Vignesh