On Fri, Mar 14, 2025 at 04:14:25PM +1300, David Rowley wrote: > I don't think I'm discarding them... As far as I'm aware, your > remaining concern is with custom jumble functions and you showed an > example in [1] of a hypothetical jumble function that could cause the > same bug as this thread is discussing. My response to that was that > the custom jumble function is broken and should be fixed, which seems > to me to be analogous to Ivan's proposal to fix _jumbleNode() to do > something with NULL pointers.
Okay, that's where our opinions diverge. So, well, please let me reformulate that with some different words and terms. This is not a problem of having a NULL node vs a non-NULL node in the jumbling, because by design is it possible for one to decide if a Node can be included in the jumbling depending on its internal values, particularly if it gets used across one than one type of parent node. You are claiming that such functions are broken, but what I am saying is that such function can be OK depending on how one wants this code to behave depending on their node structure and what they expect from them. So I'm making a claim about keeping this stuff more flexible, while also addressing the problem of the same node defined successively more than once in a parent structure. > I now can't tell which of the following is true: 1) I've missed one of > your concerns, or; 2) you want to find a way to make badly coded > custom jumble functions not suffer from this bug, or; 3) you think > that adding jumble bytes unconditionally regardless of the field's > value still does not fix the bug in question, or; 4) something else. > It would be good to know which of these it is. I hope that my opinion counts, as I've worked on this whole design in 3db72ebcbe20, 8eba3e3f0208 and other related commits. FWIW, I don't really see any disadvantage in supporting what I'm claiming is OK, giving more flexibility to folks to do what they want with this facility, if it also tackles this thread's problem with the Node handling. So, here is attached a counter-proposal, where we can simply added a counter tracking a node count in _jumbleNode() to add more entropy to the mix, incrementing it as well for NULL nodes. By the way, I was looking at the set of tests proposed upthread at this message, which is as far as I know the latest version proposed: https://www.postgresql.org/message-id/5ac172e0b77a4baba50671cd1a15285f@localhost.localdomain The tests do require neither a relation nor a stats reset, so let's make it cheaper as I've proposed upthread and in the attached, including checks with multiple queries and different constants to make sure that these are correctly grouped in the pgss reports. The null_sequence_number reset each time we run through a non-NULL node from variant B reduces a bit the entropy, btw.. The argument about adding a counter in AppendJumble() is the wrong level to use for such a change. Anyway, perhaps we should have your extra byte '\0' or something equivalent added to JUMBLE_STRING() if dealing with a NULL string rather than ignoring it. There's an argument about simplicity, IMO, still it is true that ignoring a NULL value reduces the entropy of the query ID. So I'd be OK with that if you feel strongly about this point, at it sound to me that you are? This could be better than a hardcoded '\0' byte as we could use a counter and JUMBLE_FIELD_SINGLE() in the NULL case of JUMBLE_STRING(). It's not proved to be a problem yet, and there's value in having a simpler solution, as well. Anyway, one line I'd like to draw is that appendJumble() remains as it is written currently. One extra thing that We could do is to update it so as the size of an item given is always more than zero, with an Assert(), to enforce a stricter policy. -- Michael
From c78150fea9a49aa2ab06f4b03b1e5beea4457366 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Mon, 17 Mar 2025 09:38:22 +0900 Subject: [PATCH v4] Add more entropy to query jumbling A counter tracking the number of nodes computed is added, and all nodes use it in their computation. This matters also for NULL nodes, that were ignored from the computation up to now. --- src/include/nodes/queryjumble.h | 7 ++ src/backend/nodes/queryjumblefuncs.c | 10 +++ .../pg_stat_statements/expected/select.out | 87 ++++++++++++++++++- contrib/pg_stat_statements/sql/select.sql | 20 +++++ 4 files changed, 123 insertions(+), 1 deletion(-) diff --git a/src/include/nodes/queryjumble.h b/src/include/nodes/queryjumble.h index 50eb95665872..aa1126f50162 100644 --- a/src/include/nodes/queryjumble.h +++ b/src/include/nodes/queryjumble.h @@ -37,6 +37,13 @@ typedef struct JumbleState /* Number of bytes used in jumble[] */ Size jumble_len; + /* + * Number of Nodes included in the computation. This counter is + * incremented each time a node is added to the jumbling computation, + * and is added to the jumbling to increase its entropy. + */ + int node_count; + /* Array of locations of constants that should be removed */ LocationLen *clocations; diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index b103a2819366..ec74b92d5c3c 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -120,6 +120,7 @@ JumbleQuery(Query *query) /* Set up workspace for query jumbling */ jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE); jstate->jumble_len = 0; + jstate->node_count = 0; jstate->clocations_buf_size = 32; jstate->clocations = (LocationLen *) palloc(jstate->clocations_buf_size * sizeof(LocationLen)); @@ -244,6 +245,15 @@ _jumbleNode(JumbleState *jstate, Node *node) { Node *expr = node; + /* + * Increment the node count, and add it to the jumbling. This operation + * is done before checking if a Node is NULL, so as even a NULL node is + * counted in the computation, without depending on its data, with some + * data that we know to be unique for each computation. + */ + jstate->node_count++; + JUMBLE_FIELD_SINGLE(jstate->node_count); + if (expr == NULL) return; diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out index 37a30af034a6..1587d2cafb3a 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -19,6 +19,86 @@ SELECT 1 AS "int"; 1 (1 row) +-- LIMIT and OFFSET patterns +-- These require more entropy with parsing node offsets. +SELECT 1 AS "int" LIMIT 1; + int +----- + 1 +(1 row) + +SELECT 1 AS "int" LIMIT 2; + int +----- + 1 +(1 row) + +SELECT 1 AS "int" OFFSET 1; + int +----- +(0 rows) + +SELECT 1 AS "int" OFFSET 2; + int +----- +(0 rows) + +SELECT 1 AS "int" OFFSET 1 LIMIT 1; + int +----- +(0 rows) + +SELECT 1 AS "int" OFFSET 2 LIMIT 2; + int +----- +(0 rows) + +SELECT 1 AS "int" LIMIT 1 OFFSET 1; + int +----- +(0 rows) + +SELECT 1 AS "int" LIMIT 3 OFFSET 3; + int +----- +(0 rows) + +SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY; + int +----- +(0 rows) + +SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY; + int +----- +(0 rows) + +-- DISTINCT and ORDER BY patterns +-- These require more entropy with parsing node offsets. +SELECT DISTINCT 1 AS "int"; + int +----- + 1 +(1 row) + +SELECT DISTINCT 2 AS "int"; + int +----- + 2 +(1 row) + +SELECT 1 AS "int" ORDER BY 1; + int +----- + 1 +(1 row) + +SELECT 2 AS "int" ORDER BY 1; + int +----- + 2 +(1 row) + /* this comment should not appear in the output */ SELECT 'hello' -- but this one will appear @@ -135,9 +215,14 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; 3 | 3 | SELECT $1 + $2 + $3 AS "add" 1 | 1 | SELECT $1 AS "float" 2 | 2 | SELECT $1 AS "int" + 2 | 2 | SELECT $1 AS "int" LIMIT $2 + 2 | 0 | SELECT $1 AS "int" OFFSET $2 + 6 | 0 | SELECT $1 AS "int" OFFSET $2 LIMIT $3 + 2 | 2 | SELECT $1 AS "int" ORDER BY 1 1 | 2 | SELECT $1 AS i UNION SELECT $2 ORDER BY i 1 | 1 | SELECT $1 || $2 1 | 1 | SELECT $1, $2 LIMIT $3 + 2 | 2 | SELECT DISTINCT $1 AS "int" 0 | 0 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C" 1 | 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t 1 | 2 | WITH t(f) AS ( + @@ -145,7 +230,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; | | ) + | | SELECT f FROM t ORDER BY f 1 | 1 | select $1::jsonb ? $2 -(12 rows) +(17 rows) SELECT pg_stat_statements_reset() IS NOT NULL AS t; t diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql index e0be58d5e24b..4dcfa8ef74dc 100644 --- a/contrib/pg_stat_statements/sql/select.sql +++ b/contrib/pg_stat_statements/sql/select.sql @@ -12,6 +12,26 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t; -- SELECT 1 AS "int"; +-- LIMIT and OFFSET patterns +-- These require more entropy with parsing node offsets. +SELECT 1 AS "int" LIMIT 1; +SELECT 1 AS "int" LIMIT 2; +SELECT 1 AS "int" OFFSET 1; +SELECT 1 AS "int" OFFSET 2; +SELECT 1 AS "int" OFFSET 1 LIMIT 1; +SELECT 1 AS "int" OFFSET 2 LIMIT 2; +SELECT 1 AS "int" LIMIT 1 OFFSET 1; +SELECT 1 AS "int" LIMIT 3 OFFSET 3; +SELECT 1 AS "int" OFFSET 1 FETCH FIRST 2 ROW ONLY; +SELECT 1 AS "int" OFFSET 2 FETCH FIRST 3 ROW ONLY; + +-- DISTINCT and ORDER BY patterns +-- These require more entropy with parsing node offsets. +SELECT DISTINCT 1 AS "int"; +SELECT DISTINCT 2 AS "int"; +SELECT 1 AS "int" ORDER BY 1; +SELECT 2 AS "int" ORDER BY 1; + /* this comment should not appear in the output */ SELECT 'hello' -- but this one will appear -- 2.47.2
signature.asc
Description: PGP signature