Re: Normalization of utility queries in pg_stat_statements

2023-08-17 Thread Michael Paquier
On Wed, Aug 16, 2023 at 05:11:47PM +0800, jian he wrote:
> SELECT calls, toplevel, rows, query FROM pg_stat_statements ORDER BY
> query COLLATE "C";
> returns:
>  calls | toplevel | rows |   query
> ---+--+--+
>  1 | t|0 | CALL ptest3($1)
>  2 | f|2 | INSERT INTO cp_test VALUES ($2, x)
>  1 | t|1 | SELECT pg_stat_statements_reset()
> 
> here, the intermediate CALL part is optimized away. or should I expect
> CALL ptest1($1) also in pg_stat_statements?

I would have guessed that ptest1() being called as part of ptest3()
should show up in the report if you use track = all, as all the nested
queries of a function, even if it is pure SQL, ought to show up.  Now
note that ptest1() not showing up is not a new behavior, ~15 does the
same thing by missing it.
--
Michael


signature.asc
Description: PGP signature


Re: Normalization of utility queries in pg_stat_statements

2023-08-16 Thread jian he
On Wed, Mar 8, 2023 at 2:19 PM Michael Paquier  wrote:
>
> On Fri, Mar 03, 2023 at 09:37:27AM +0900, Michael Paquier wrote:
> > Thanks for double-checking, applied 0001 to finish this part of the
> > work.  I am attaching the remaining bits as of the attached, combined
> > into a single patch.
>
> Doing so as a single patch was not feeling right as this actually
> fixes issues with the location calculations for the Const node, so I
> have split that into three commits and finally applied the whole.
>
> As a bonus, please see attached a patch to apply the normalization to
> CALL statements using the new automated infrastructure.  OUT
> parameters can be passed to a procedure, hence I guess that these had
> better be silenced as well.  This is not aimed at being integrated,
> just for reference.
> --
> Michael

I tested it. all normally works fine. only 1 corner case:
set pg_stat_statements.track =  'all';
drop table if Exists cp_test;
CREATE TABLE cp_test (a int, b text);
CREATE or REPLACE PROCEDURE ptest1(x text) LANGUAGE SQL AS $$ INSERT
INTO cp_test VALUES (1, x); $$;

CREATE or REPLACE PROCEDURE ptest3(y text)
LANGUAGE SQL
AS $$
CALL ptest1(y);
CALL ptest1($1);
$$;
SELECT pg_stat_statements_reset();

CALL ptest3('b');

SELECT calls, toplevel, rows, query FROM pg_stat_statements ORDER BY
query COLLATE "C";
returns:
 calls | toplevel | rows |   query
---+--+--+
 1 | t|0 | CALL ptest3($1)
 2 | f|2 | INSERT INTO cp_test VALUES ($2, x)
 1 | t|1 | SELECT pg_stat_statements_reset()

here, the intermediate CALL part is optimized away. or should I expect
CALL ptest1($1) also in pg_stat_statements?




Re: Normalization of utility queries in pg_stat_statements

2023-03-07 Thread Michael Paquier
On Fri, Mar 03, 2023 at 09:37:27AM +0900, Michael Paquier wrote:
> Thanks for double-checking, applied 0001 to finish this part of the
> work.  I am attaching the remaining bits as of the attached, combined
> into a single patch.

Doing so as a single patch was not feeling right as this actually
fixes issues with the location calculations for the Const node, so I
have split that into three commits and finally applied the whole.

As a bonus, please see attached a patch to apply the normalization to
CALL statements using the new automated infrastructure.  OUT
parameters can be passed to a procedure, hence I guess that these had
better be silenced as well.  This is not aimed at being integrated,
just for reference.
--
Michael
From 7fff41b050b8da4b7a006071b51ae5fa43bc7c0b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 8 Mar 2023 15:14:15 +0900
Subject: [PATCH] Apply normalization to CALL statements

---
 src/include/nodes/parsenodes.h|  6 ++--
 src/backend/nodes/queryjumblefuncs.c  | 12 +++
 .../pg_stat_statements/expected/utility.out   | 33 ---
 contrib/pg_stat_statements/sql/utility.sql|  9 +
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 259e814253..32e5f535c1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3227,12 +3227,14 @@ typedef struct InlineCodeBlock
  */
 typedef struct CallStmt
 {
+	pg_node_attr(custom_query_jumble)
+
 	NodeTag		type;
 	FuncCall   *funccall;		/* from the parser */
 	/* transformed call, with only input args */
-	FuncExpr   *funcexpr pg_node_attr(query_jumble_ignore);
+	FuncExpr   *funcexpr;
 	/* transformed output-argument expressions */
-	List	   *outargs pg_node_attr(query_jumble_ignore);
+	List	   *outargs;
 } CallStmt;
 
 typedef struct CallContext
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d7fd72d70f..709f91ab0e 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -52,6 +52,7 @@ static void _jumbleNode(JumbleState *jstate, Node *node);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleRangeTblEntry(JumbleState *jstate, Node *node);
+static void _jumbleCallStmt(JumbleState *jstate, Node *node);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -395,3 +396,14 @@ _jumbleRangeTblEntry(JumbleState *jstate, Node *node)
 			break;
 	}
 }
+
+static void
+_jumbleCallStmt(JumbleState *jstate, Node *node)
+{
+	CallStmt *expr = (CallStmt *) node;
+	FuncExpr *func = expr->funcexpr;
+
+	JUMBLE_FIELD_SINGLE(func->funcid);
+	_jumbleNode(jstate, (Node *) func->args);
+	_jumbleNode(jstate, (Node *) expr->outargs);
+}
diff --git a/contrib/pg_stat_statements/expected/utility.out b/contrib/pg_stat_statements/expected/utility.out
index 0047aba5d1..b5a8c6937c 100644
--- a/contrib/pg_stat_statements/expected/utility.out
+++ b/contrib/pg_stat_statements/expected/utility.out
@@ -240,6 +240,12 @@ DECLARE
 BEGIN
   SELECT (i + i)::int INTO r;
 END; $$ LANGUAGE plpgsql;
+CREATE OR REPLACE PROCEDURE sum_out(IN i int, IN j int, OUT k int, OUT l int) AS $$
+DECLARE
+  r int;
+BEGIN
+  SELECT (i + i)::int INTO r;
+END; $$ LANGUAGE plpgsql;
 CREATE OR REPLACE PROCEDURE sum_two(i int, j int) AS $$
 DECLARE
   r int;
@@ -256,15 +262,32 @@ CALL sum_one(3);
 CALL sum_one(199);
 CALL sum_two(1,1);
 CALL sum_two(1,2);
+CALL sum_out(1,1,1,1);
+ k | l 
+---+---
+   |  
+(1 row)
+
+CALL sum_out(2,2,1,1);
+ k | l 
+---+---
+   |  
+(1 row)
+
+CALL sum_out(2,3,4,5);
+ k | l 
+---+---
+   |  
+(1 row)
+
 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls | rows |   query   
 ---+--+---
- 1 |0 | CALL sum_one(199)
- 1 |0 | CALL sum_one(3)
- 1 |0 | CALL sum_two(1,1)
- 1 |0 | CALL sum_two(1,2)
+ 2 |0 | CALL sum_one($1)
+ 3 |0 | CALL sum_out($1,$2,$3,$4)
+ 2 |0 | CALL sum_two($1,$2)
  1 |1 | SELECT pg_stat_statements_reset()
-(5 rows)
+(4 rows)
 
 -- COPY
 CREATE TABLE copy_stats (a int, b int);
diff --git a/contrib/pg_stat_statements/sql/utility.sql b/contrib/pg_stat_statements/sql/utility.sql
index 225d30a62a..dbf38c31bc 100644
--- a/contrib/pg_stat_statements/sql/utility.sql
+++ b/contrib/pg_stat_statements/sql/utility.sql
@@ -131,6 +131,12 @@ DECLARE
 BEGIN
   SELECT (i + i)::int INTO r;
 END; $$ LANGUAGE plpgsql;
+CREATE OR REPLACE PROCEDURE sum_out(IN i int, IN j int, OUT k int, OUT l int) AS $$
+DECLARE
+  r int;
+BEGIN
+  SELECT (i + i)::int INTO r;
+END; $$ LANGUAGE plpgsql;
 CREATE OR REPLACE PROCEDURE sum_two(i int, j int) AS $$
 DECLARE
   r int;
@@ -142,6 +148,9 @@ CALL sum_one(3);
 CALL sum_one(199);
 CALL sum_two(1,1);
 CALL sum_two(1,2);
+CALL 

Re: Normalization of utility queries in pg_stat_statements

2023-03-06 Thread Michael Paquier
On Mon, Mar 06, 2023 at 03:50:55PM +0300, Andrei Zubkov wrote:
> Those statements is not related to any WAL tests. It seems a little bit
> incorrect to me.

The intention is to have each file run in isolation, so this is
incorrect as it stands.  Thanks for the report!
--
Michael


signature.asc
Description: PGP signature


Re: Normalization of utility queries in pg_stat_statements

2023-03-06 Thread Andrei Zubkov
Hi Michael!

I'm rebasing a patch "Tracking statements entry timestamp in
pg_stat_statements" for applying after this patch. I've noted that
current tests are not quite independent one from another. There is two
statements in the end of user_activity.sql test:

DROP ROLE regress_stats_user1;
DROP ROLE regress_stats_user2;

Those are done after the last pg_stat_statements_reset call in this
test file and thus, those are included in checks of wal.out file:

query 
| calls | rows | wal_bytes_generated | wal_records_generated |
wal_records_ge_rows 
---
---+---+--+-+---+--
---
 DELETE FROM pgss_wal_tab WHERE a > $1
| 1 |1 | t   | t | t
 DROP ROLE regress_stats_user1
| 1 |0 | t   | t | t
 DROP ROLE regress_stats_user2
| 1 |0 | t   | t | t

Those statements is not related to any WAL tests. It seems a little bit
incorrect to me.

Are we need some changes here?
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






Re: Normalization of utility queries in pg_stat_statements

2023-03-02 Thread Michael Paquier
On Thu, Mar 02, 2023 at 08:12:24AM +0100, Drouvot, Bertrand wrote:
> Applying 0001 produces:
> 
> Applying: Split more regression tests of pg_stat_statements
> .git/rebase-apply/patch:1735: new blank line at EOF.
> +
> .git/rebase-apply/patch:2264: new blank line at EOF.
> +
> warning: 2 lines add whitespace errors.

Indeed, removed.

> What about removing those comments?

Removing these two as well.

> Except from the Nits above, 0001 LGTM.

Thanks for double-checking, applied 0001 to finish this part of the
work.  I am attaching the remaining bits as of the attached, combined
into a single patch.  I am going to look at it again at the beginning
of next week and potentially apply it so as the normalization reflects
to the reports of pg_stat_statements.
--
Michael
From f96f6a65d5318b059527230d074660ffec099129 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 3 Mar 2023 09:28:13 +0900
Subject: [PATCH v6] Remove normalization of A_Const nodes

Doing so leads to weird cases with commands that can define a
transaction isolation (SET TRANSACTION and BEGIN), as the normalization
is not able to copy with the full field, yet.

Applying normalization of Const nodes to DDLs changes the states of the
following commands:
- DECLARE
- EXPLAIN
- CREATE MATERIALIZED VIEW
- CTAS

At the end, this should be merged with the previous patch, but keeping
it separate shows the difference of behavior between the two approaches
in the regression tests of pg_stat_statements.
---
 src/include/nodes/parsenodes.h|  8 +++-
 src/include/nodes/primnodes.h |  9 +++--
 .../pg_stat_statements/expected/cursors.out   | 14 +++
 .../pg_stat_statements/expected/utility.out   | 38 +--
 .../pg_stat_statements/pg_stat_statements.c   |  4 +-
 5 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..259e814253 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3221,14 +3221,18 @@ typedef struct InlineCodeBlock
  * list contains copies of the expressions for all output arguments, in the
  * order of the procedure's declared arguments.  (outargs is never evaluated,
  * but is useful to the caller as a reference for what to assign to.)
+ * The transformed call state is not relevant in the query jumbling, only the
+ * function call is.
  * --
  */
 typedef struct CallStmt
 {
 	NodeTag		type;
 	FuncCall   *funccall;		/* from the parser */
-	FuncExpr   *funcexpr;		/* transformed call, with only input args */
-	List	   *outargs;		/* transformed output-argument expressions */
+	/* transformed call, with only input args */
+	FuncExpr   *funcexpr pg_node_attr(query_jumble_ignore);
+	/* transformed output-argument expressions */
+	List	   *outargs pg_node_attr(query_jumble_ignore);
 } CallStmt;
 
 typedef struct CallContext
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index b4292253cc..4220c63ab7 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -128,8 +128,10 @@ typedef struct TableFunc
  * CREATE MATERIALIZED VIEW
  *
  * For CREATE MATERIALIZED VIEW, viewQuery is the parsed-but-not-rewritten
- * SELECT Query for the view; otherwise it's NULL.  (Although it's actually
- * Query*, we declare it as Node* to avoid a forward reference.)
+ * SELECT Query for the view; otherwise it's NULL.  This is irrelevant in
+ * the query jumbling as CreateTableAsStmt already includes a reference to
+ * its own Query, so ignore it.  (Although it's actually Query*, we declare
+ * it as Node* to avoid a forward reference.)
  */
 typedef struct IntoClause
 {
@@ -141,7 +143,8 @@ typedef struct IntoClause
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
-	Node	   *viewQuery;		/* materialized view's SELECT query */
+	/* materialized view's SELECT query */
+	Node	   *viewQuery pg_node_attr(query_jumble_ignore);
 	bool		skipData;		/* true for WITH NO DATA */
 } IntoClause;
 
diff --git a/contrib/pg_stat_statements/expected/cursors.out b/contrib/pg_stat_statements/expected/cursors.out
index 5d0dc196f9..46375ea905 100644
--- a/contrib/pg_stat_statements/expected/cursors.out
+++ b/contrib/pg_stat_statements/expected/cursors.out
@@ -16,10 +16,10 @@ CLOSE cursor_stats_1;
 DECLARE cursor_stats_1 CURSOR WITH HOLD FOR SELECT 2;
 CLOSE cursor_stats_1;
 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
- calls | rows |query 
+--+--
+ calls | rows | query 
+---+--+---
  2 |0 | CLOSE cursor_stats_1
- 2 |0 | DECLARE cursor_stats_1 CURSOR WITH HOLD FOR 

Re: Normalization of utility queries in pg_stat_statements

2023-03-01 Thread Drouvot, Bertrand

Hi,

On 3/1/23 5:47 AM, Michael Paquier wrote:

On Mon, Feb 20, 2023 at 11:34:59AM +0900, Michael Paquier wrote:

With the patches..


Attached is an updated patch set, where I have done more refactoring
work for the regression tests of pg_stat_statements, splitting
pg_stat_statments.sql into the following files:
- user_activity.sql for the role-level resets.
- wal.sql for the WAL generation tracking.
- dml.sql for insert/update/delete/merge and row counts.
- The main file is renamed to select.sql, as it now only covers SELECT
patterns.



Thanks!

Splitting even more and removing pg_stat_statements.sql/out does make sense to 
me,
so +1 for the patch.

Applying 0001 produces:

Applying: Split more regression tests of pg_stat_statements
.git/rebase-apply/patch:1735: new blank line at EOF.
+
.git/rebase-apply/patch:2264: new blank line at EOF.
+
warning: 2 lines add whitespace errors.


Nits:

+++ b/contrib/pg_stat_statements/sql/wal.sql
@@ -0,0 +1,22 @@
+--
+-- Validate WAL generation metrics
+--
+
+SET pg_stat_statements.track_utility = FALSE;
+
+-- utility "create table" should not be shown

This comment is coming from the previous pg_stat_statements.sql but
I wonder if it makes sense here as testing utility is not the initial purpose
of wal.sql.

Same comment for dml.sql:

+-- utility "create table" should not be shown
+CREATE TEMP TABLE pgss_dml_tab (a int, b char(20));

What about removing those comments?


There is no change in the code coverage or the patterns tested.


I had a look (comparing the new .sql files with the old pg_stat_statements.sql 
content) and I agree.

Except from the Nits above, 0001 LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Normalization of utility queries in pg_stat_statements

2023-02-28 Thread Michael Paquier
On Mon, Feb 20, 2023 at 11:34:59AM +0900, Michael Paquier wrote:
> With the patches..

Attached is an updated patch set, where I have done more refactoring
work for the regression tests of pg_stat_statements, splitting
pg_stat_statments.sql into the following files:
- user_activity.sql for the role-level resets.
- wal.sql for the WAL generation tracking.
- dml.sql for insert/update/delete/merge and row counts.
- The main file is renamed to select.sql, as it now only covers SELECT
patterns.

There is no change in the code coverage or the patterns tested.  And
with that, I am rather comfortable with the shape of the regression
tests moving forward.

0002 and 0003 are equivalent to the previous 0003 and 0004 in v4, that
switch pg_stat_statements to apply the normalization to utilities that
use Const nodes.
--
Michael
From 23476ee28a8e5bb82859369a8f54e9cd3c45fc32 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 1 Mar 2023 13:34:11 +0900
Subject: [PATCH v5 1/3] Split more regression tests of pg_stat_statements

This commit expands more the refactoring of the regression tests of
pg_stat_statements, with tests moved out of pg_stat_statements.sql to
separate files, as of:
- select.sql is mainly the former pg_stat_statements.sql, renamed.
- dml.sql for INSERT/UPDATE/DELETE and MERGE
- user_activity, to test role-level checks and stat resets.
- wal, to check the WAL generation.
---
 contrib/pg_stat_statements/Makefile   |   4 +-
 contrib/pg_stat_statements/expected/dml.out   | 148 
 .../expected/pg_stat_statements.out   | 768 --
 .../pg_stat_statements/expected/select.out| 411 ++
 .../expected/user_activity.out| 199 +
 contrib/pg_stat_statements/expected/wal.out   |  35 +
 contrib/pg_stat_statements/meson.build|   5 +-
 contrib/pg_stat_statements/sql/dml.sql|  78 ++
 .../sql/pg_stat_statements.sql| 300 ---
 contrib/pg_stat_statements/sql/select.sql | 146 
 .../pg_stat_statements/sql/user_activity.sql  |  65 ++
 contrib/pg_stat_statements/sql/wal.sql|  22 +
 12 files changed, 1110 insertions(+), 1071 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/dml.out
 delete mode 100644 contrib/pg_stat_statements/expected/pg_stat_statements.out
 create mode 100644 contrib/pg_stat_statements/expected/select.out
 create mode 100644 contrib/pg_stat_statements/expected/user_activity.out
 create mode 100644 contrib/pg_stat_statements/expected/wal.out
 create mode 100644 contrib/pg_stat_statements/sql/dml.sql
 delete mode 100644 contrib/pg_stat_statements/sql/pg_stat_statements.sql
 create mode 100644 contrib/pg_stat_statements/sql/select.sql
 create mode 100644 contrib/pg_stat_statements/sql/user_activity.sql
 create mode 100644 contrib/pg_stat_statements/sql/wal.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 69fbc6a858..5578a9dd4e 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -17,8 +17,8 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
-REGRESS = pg_stat_statements cursors utility level_tracking planning \
-	cleanup oldextversions
+REGRESS = select dml cursors utility level_tracking planning \
+	user_activity wal cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/dml.out b/contrib/pg_stat_statements/expected/dml.out
new file mode 100644
index 00..803d993e85
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/dml.out
@@ -0,0 +1,148 @@
+--
+-- DMLs on test table
+--
+SET pg_stat_statements.track_utility = FALSE;
+-- utility "create table" should not be shown
+CREATE TEMP TABLE pgss_dml_tab (a int, b char(20));
+INSERT INTO pgss_dml_tab VALUES(generate_series(1, 10), 'aaa');
+UPDATE pgss_dml_tab SET b = 'bbb' WHERE a > 7;
+DELETE FROM pgss_dml_tab WHERE a > 9;
+-- explicit transaction
+BEGIN;
+UPDATE pgss_dml_tab SET b = '111' WHERE a = 1 ;
+COMMIT;
+BEGIN \;
+UPDATE pgss_dml_tab SET b = '222' WHERE a = 2 \;
+COMMIT ;
+UPDATE pgss_dml_tab SET b = '333' WHERE a = 3 \;
+UPDATE pgss_dml_tab SET b = '444' WHERE a = 4 ;
+BEGIN \;
+UPDATE pgss_dml_tab SET b = '555' WHERE a = 5 \;
+UPDATE pgss_dml_tab SET b = '666' WHERE a = 6 \;
+COMMIT ;
+-- many INSERT values
+INSERT INTO pgss_dml_tab (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c');
+-- SELECT with constants
+SELECT * FROM pgss_dml_tab WHERE a > 5 ORDER BY a ;
+ a |  b   
+---+--
+ 6 | 666 
+ 7 | aaa 
+ 8 | bbb 
+ 9 | bbb 
+(4 rows)
+
+SELECT *
+  FROM pgss_dml_tab
+  WHERE a > 

Re: Normalization of utility queries in pg_stat_statements

2023-02-19 Thread Michael Paquier
On Mon, Feb 20, 2023 at 11:32:23AM +0900, Michael Paquier wrote:
> These last ones are staying around for a few more weeks, until the
> middle of the next CF, I guess.  After all this is done, the final
> changes are very short, showing the effects of the normalization, as
> of:
>  6 files changed, 45 insertions(+), 35 deletions(-)

With the patches..
--
Michael
From da6b7a7c5199c4db6ea8906df6c0e256559735f1 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 8 Feb 2023 11:00:01 +0900
Subject: [PATCH v4 3/4] Apply normalization to A_Const and utilities in
 pg_stat_statements

Its value is now ignored and location is stored, so as it is possible to
apply query normalization across more query types:
- SET
- CALL
- COPY TO with queries
- View, matviews and CTAS
- EXPLAIN
- Triggers
- Rules
- Statistics
---
 src/include/nodes/parsenodes.h|   8 +-
 src/include/nodes/primnodes.h |   9 +-
 src/backend/nodes/queryjumblefuncs.c  |  23 +--
 doc/src/sgml/pgstatstatements.sgml|   7 +-
 .../pg_stat_statements/expected/cursors.out   |  14 +-
 .../expected/level_tracking.out   |  20 +-
 .../pg_stat_statements/expected/utility.out   | 174 --
 .../pg_stat_statements/pg_stat_statements.c   |   4 +-
 8 files changed, 117 insertions(+), 142 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index f7d7f10f7d..259e814253 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3221,14 +3221,18 @@ typedef struct InlineCodeBlock
  * list contains copies of the expressions for all output arguments, in the
  * order of the procedure's declared arguments.  (outargs is never evaluated,
  * but is useful to the caller as a reference for what to assign to.)
+ * The transformed call state is not relevant in the query jumbling, only the
+ * function call is.
  * --
  */
 typedef struct CallStmt
 {
 	NodeTag		type;
 	FuncCall   *funccall;		/* from the parser */
-	FuncExpr   *funcexpr;		/* transformed call, with only input args */
-	List	   *outargs;		/* transformed output-argument expressions */
+	/* transformed call, with only input args */
+	FuncExpr   *funcexpr pg_node_attr(query_jumble_ignore);
+	/* transformed output-argument expressions */
+	List	   *outargs pg_node_attr(query_jumble_ignore);
 } CallStmt;
 
 typedef struct CallContext
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1be1642d92..188ff1d249 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -128,8 +128,10 @@ typedef struct TableFunc
  * CREATE MATERIALIZED VIEW
  *
  * For CREATE MATERIALIZED VIEW, viewQuery is the parsed-but-not-rewritten
- * SELECT Query for the view; otherwise it's NULL.  (Although it's actually
- * Query*, we declare it as Node* to avoid a forward reference.)
+ * SELECT Query for the view; otherwise it's NULL.  This is irrelevant in
+ * the query jumbling as CreateTableAsStmt already includes a reference to
+ * its own Query, so ignore it.  (Although it's actually Query*, we declare
+ * it as Node* to avoid a forward reference.)
  */
 typedef struct IntoClause
 {
@@ -141,7 +143,8 @@ typedef struct IntoClause
 	List	   *options;		/* options from WITH clause */
 	OnCommitAction onCommit;	/* what do we do at COMMIT? */
 	char	   *tableSpaceName; /* table space to use, or NULL */
-	Node	   *viewQuery;		/* materialized view's SELECT query */
+	/* materialized view's SELECT query */
+	Node	   *viewQuery pg_node_attr(query_jumble_ignore);
 	bool		skipData;		/* true for WITH NO DATA */
 } IntoClause;
 
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index d7fd72d70f..0f08f4c75e 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -323,29 +323,8 @@ _jumbleA_Const(JumbleState *jstate, Node *node)
 	if (!expr->isnull)
 	{
 		JUMBLE_FIELD(val.node.type);
-		switch (nodeTag(>val))
-		{
-			case T_Integer:
-JUMBLE_FIELD(val.ival.ival);
-break;
-			case T_Float:
-JUMBLE_STRING(val.fval.fval);
-break;
-			case T_Boolean:
-JUMBLE_FIELD(val.boolval.boolval);
-break;
-			case T_String:
-JUMBLE_STRING(val.sval.sval);
-break;
-			case T_BitString:
-JUMBLE_STRING(val.bsval.bsval);
-break;
-			default:
-elog(ERROR, "unrecognized node type: %d",
-	 (int) nodeTag(>val));
-break;
-		}
 	}
+	JUMBLE_LOCATION(location);
 }
 
 static void
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index efc36da602..cee9376916 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -489,11 +489,12 @@
Plannable queries (that is, SELECT, INSERT,
UPDATE, DELETE, and MERGE) are combined into a single
pg_stat_statements entry whenever they have identical query
-   structures according to an internal hash calculation.  Typically, two
+   

Re: Normalization of utility queries in pg_stat_statements

2023-02-19 Thread Michael Paquier
On Fri, Feb 17, 2023 at 09:36:27AM +0100, Drouvot, Bertrand wrote:
> On 2/17/23 3:35 AM, Michael Paquier wrote:
>> 0001 looks quite committable at this stage, and that's independent on
>> the rest.  At the end this patch creates four new test files that are
>> extended in the next patches: utility, planning, track and cleanup.
> 
> Thanks! LGTM.

Thanks.  I have applied the set of regression tests in 0001 and 0002.
Note that I have changed the order of the attributes when querying
pg_stat_statements, to make easier to follow the diffs generated by
the normalization.  The unaligned mode would be another option, but
it makes not much sense as long as there are no more than two fields
with variable lengths.  Some extra notes about that:
- Should the test for the validation WAL generation metrics be moved
out?  I am not sure that it makes much sense to separate it as it has
a short purpose.
- Same issue with user activity, which creates a few roles and makes
sure that their activity is tracked?  We don't look at the userid in
this case, which does not make much sense to me.
- Same issue with locking clauses, worth a file of their own?

The main file is still named pg_stat_statements.sql, perhaps it should
be renamed to something more generic, like general.sql?  Or perhaps we
could just split the main file with a select.sql (with locking
clauses) and a dml.sql?

>> I am wondering if others have an opinion to share about that, but,
>> yes, 0004 seems enough to begin with.  We could always study more
>> normalization areas in future releases, taking it slowly.
> 
> Agree.

These last ones are staying around for a few more weeks, until the
middle of the next CF, I guess.  After all this is done, the final
changes are very short, showing the effects of the normalization, as
of:
 6 files changed, 45 insertions(+), 35 deletions(-)
--
Michael


signature.asc
Description: PGP signature


Re: Normalization of utility queries in pg_stat_statements

2023-02-17 Thread Drouvot, Bertrand

Hi,

On 2/17/23 3:35 AM, Michael Paquier wrote:

On Thu, Feb 16, 2023 at 10:55:32AM +0100, Drouvot, Bertrand wrote:

In the new pg_stat_statements.sql? That way pg_stat_statements.sql would always 
behave
with default values for those (currently we are setting both of them as non 
default).

Then, with the default values in place, if we feel that some tests
are missing we could add them in > utility.sql or planning.sql
accordingly.


I am not sure about this part, TBH, so I have left these as they are.

Anyway, while having a second look at that, I have noticed that it is
possible to extract as an independent piece all the tests related to
level tracking.  Things are worse than I thought initially, actually,
because we had test scenarios mixing planning and level tracking, but
the tests don't care about measuring plans at all, see around FETCH
FORWARD, meaning that queries on the table pg_stat_statements have
just been copy-pasted around for the last few years.  There were more
tests that used "test" for a table name ;)

I have been pondering about this part, and the tracking matters for DO
blocks and PL functions, so I have moved all these cases into a new,
separate file.  There is a bit more that can be done, for WAL tracking
and roles near the end of pg_stat_statements.sql, but I have left that
out for now.  I have checked the output of the tests before and after
the refactoring for quite a bit of time, and the outputs match so
there is no loss of coverage.

0001 looks quite committable at this stage, and that's independent on
the rest.  At the end this patch creates four new test files that are
extended in the next patches: utility, planning, track and cleanup.



Thanks! LGTM.


0002:

Produces:
v2-0002-Add-more-test-for-utility-queries-in-pg_stat_stat.patch:834: trailing 
whitespace.
CREATE VIEW view_stats_1 AS
v2-0002-Add-more-test-for-utility-queries-in-pg_stat_stat.patch:838: trailing 
whitespace.
CREATE VIEW view_stats_1 AS
warning: 2 lines add whitespace errors.


Thanks, fixed.



Thanks!


+-- SET TRANSACTION ISOLATION
+BEGIN;
+SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

What about adding things like "SET SESSION CHARACTERISTICS AS
TRANSACTION..." too?


That's a good idea.  It is again one of these fancy cases, better to
keep a track of them in the long-term..



Right.

002 LGTM.


0003 and 0004:
Thanks for keeping 0003 that's useful to see the impact of A_Const 
normalization.

Looking at the diff they produce, I also do think that 0004 is what
could be done for PG16.


I am wondering if others have an opinion to share about that, but,
yes, 0004 seems enough to begin with.  We could always study more
normalization areas in future releases, taking it slowly.


Agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Normalization of utility queries in pg_stat_statements

2023-02-16 Thread Drouvot, Bertrand

Hi,

On 2/16/23 1:34 AM, Michael Paquier wrote:

While wondering about this stuff about the last few days and
discussing with bertrand, I have changed my mind on the point that
there is no need to be that aggressive yet with the normalization of
the A_Const nodes, because the query string normalization of
pg_stat_statements is not prepared yet to handle cases where a A_Const
value uses a non-quoted value with whitespaces.  The two cases where I
saw an impact is on the commands that can define an isolation level:
SET TRANSACTION and BEGIN.

For example, applying normalization to A_Const nodes does the
following as of HEAD:
1) BEGIN TRANSACTION READ ONLY, READ WRITE, DEFERRABLE, NOT DEFERRABLE;
BEGIN TRANSACTION $1 ONLY, $2 WRITE, $3, $4 DEFERRABLE
2) SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL $1 COMMITTED

On top of that, specifying a different isolation level may cause these
commands to be grouped, which is not really cool.  All that could be
done incrementally later on, in 17~ or later depending on the
adjustments that make sense.



Thanks for those patches!

Yeah, agree about the proposed approach.



0002 has been
completed with a couple of commands to track all the commands with
A_Const, so as we never lose sight of what happens.  0004 is what I
think could be done for PG16, where normalization affects only Const.
At the end of the day, this reflects the following commands that use
Const nodes because they use directly queries, so the same rules as
SELECT and DMLs apply to them:
- DECLARE
- EXPLAIN
- CREATE MATERIALIZED VIEW
- CTAS, SELECT INTO



0001:

I like the idea of splitting the existing tests in dedicated files.

What do you think about removing:

"
SET pg_stat_statements.track_utility = FALSE;
SET pg_stat_statements.track_planning = TRUE;
"

In the new pg_stat_statements.sql? That way pg_stat_statements.sql would always 
behave
with default values for those (currently we are setting both of them as non 
default).

Then, with the default values in place, if we feel that some tests are missing 
we could add them in
utility.sql or planning.sql accordingly.

0002:

Produces:
v2-0002-Add-more-test-for-utility-queries-in-pg_stat_stat.patch:834: trailing 
whitespace.
CREATE VIEW view_stats_1 AS
v2-0002-Add-more-test-for-utility-queries-in-pg_stat_stat.patch:838: trailing 
whitespace.
CREATE VIEW view_stats_1 AS
warning: 2 lines add whitespace errors.

+-- SET TRANSACTION ISOLATION
+BEGIN;
+SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

What about adding things like "SET SESSION CHARACTERISTICS AS TRANSACTION..." 
too?

0003 and 0004:
Thanks for keeping 0003 that's useful to see the impact of A_Const 
normalization.

Looking at the diff they produce, I also do think that 0004 is what
could be done for PG16.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Normalization of utility queries in pg_stat_statements

2023-02-15 Thread Michael Paquier
On Wed, Feb 08, 2023 at 12:05:24PM +0900, Michael Paquier wrote:
> Thoughts and comments are welcome.  0001 and 0002 are useful on their
> own to keep track of utilities that use Const and A_Const after going
> through the query jumbling, even if an approach based on query string
> or the automated query jumbling for utilities is used (the query
> string approach a bit its value).  I'll add that to the next commit
> fest.

While wondering about this stuff about the last few days and
discussing with bertrand, I have changed my mind on the point that
there is no need to be that aggressive yet with the normalization of
the A_Const nodes, because the query string normalization of
pg_stat_statements is not prepared yet to handle cases where a A_Const
value uses a non-quoted value with whitespaces.  The two cases where I
saw an impact is on the commands that can define an isolation level:
SET TRANSACTION and BEGIN.

For example, applying normalization to A_Const nodes does the
following as of HEAD:
1) BEGIN TRANSACTION READ ONLY, READ WRITE, DEFERRABLE, NOT DEFERRABLE;
BEGIN TRANSACTION $1 ONLY, $2 WRITE, $3, $4 DEFERRABLE
2) SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL $1 COMMITTED

On top of that, specifying a different isolation level may cause these
commands to be grouped, which is not really cool.  All that could be
done incrementally later on, in 17~ or later depending on the
adjustments that make sense.

Attached is an updated patch set.  0003 is basically the same as v3,
that I have kept around for clarity in case one wants to see the
effect of a A_Const normalization to all the related commands, though
I am not proposing that for an upstream integration.  0002 has been
completed with a couple of commands to track all the commands with
A_Const, so as we never lose sight of what happens.  0004 is what I
think could be done for PG16, where normalization affects only Const.
At the end of the day, this reflects the following commands that use
Const nodes because they use directly queries, so the same rules as
SELECT and DMLs apply to them:
- DECLARE
- EXPLAIN
- CREATE MATERIALIZED VIEW
- CTAS, SELECT INTO

Comments and thoughts welcome.
Thanks,
--
Michael
From eb61fe10d0d7399573ebb3a911aed4114742cf25 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 7 Feb 2023 14:33:20 +0900
Subject: [PATCH v2 1/4] Refactor regression tests of pg_stat_statements

pg_stat_statements.sql acts as the main file for all the core tests of
the module, but things have become a bit hairy over the years as some of
the sub-scenarios tested rely on assumptions that may have been set in a
completely different block, like a GUC setup or a different relation.

This commit refactors the tests of pg_stat_statements a bit, by moving a
few test cases out of pg_stat_statements.sql into their own file, as of:
- Planning-related tests in planning.sql.
- Utilities in utility.sql.

Test scenarios and their results remain the same as the originals.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/cleanup.out   |   1 +
 .../expected/pg_stat_statements.out   | 284 ++
 .../pg_stat_statements/expected/planning.out  | 195 
 .../pg_stat_statements/expected/utility.out   |  72 +
 contrib/pg_stat_statements/meson.build|   3 +
 contrib/pg_stat_statements/sql/cleanup.sql|   1 +
 .../sql/pg_stat_statements.sql| 118 +---
 contrib/pg_stat_statements/sql/planning.sql   |  78 +
 contrib/pg_stat_statements/sql/utility.sql|  34 +++
 10 files changed, 417 insertions(+), 371 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/cleanup.out
 create mode 100644 contrib/pg_stat_statements/expected/planning.out
 create mode 100644 contrib/pg_stat_statements/expected/utility.out
 create mode 100644 contrib/pg_stat_statements/sql/cleanup.sql
 create mode 100644 contrib/pg_stat_statements/sql/planning.sql
 create mode 100644 contrib/pg_stat_statements/sql/utility.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index edc40c8bbf..78dc4c1d07 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -17,7 +17,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
-REGRESS = pg_stat_statements oldextversions
+REGRESS = pg_stat_statements utility planning cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/cleanup.out b/contrib/pg_stat_statements/expected/cleanup.out
new file mode 100644
index 00..36bec35c40
--- /dev/null
+++ 

Normalization of utility queries in pg_stat_statements

2023-02-07 Thread Michael Paquier
Hi all,
(Adding Bertrand in CC.)

$subject is a follow-up of the automation of query jumbling for
utilities and DDLs, and attached is a set of patches that apply
normalization to DDL queries across the board, for all utilities.

This relies on tracking the location of A_Const nodes while removing
from the query jumbling computation the values attached to the node,
as as utility queries can show be stored as normalized in
pg_stat_statements with some $N parameters.  The main case behind
doing that is of course monitoring, where we have seen some user
instances willing to get more information but see pg_stat_statements
as a bottleneck because the query ID of utility queries are based on
the computation of their string, and is value-sensitive.  That's the
case mentioned by Bertrand Drouvot for CALL and SET where workloads
full of these easily bloat pg_stat_statements, where we concluded
about more automation in this area (so here it is):
https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com

For example, this makes possible the following grouping:
- CALL func(1,2); CALL func(1,3); => CALL func($1,$2)
- EXPLAIN SELECT 1; EXPLAIN SELECT 1; => EXPLAIN SELECT $1;
- CREATE MATERIALIZED VIEW aam AS SELECT 1; becomes "CREATE
MATERIALIZED VIEW aam AS SELECT $1".

Query jumbling for DDLs and utilities happens now automatically, still
are not represented correctly in pg_stat_statements (one bit of
documentation I missed previously refers to the fact that these depend
on their query strings, which is not the case yet).

By the way, while looking at all that, I have really underestimated
the use of Const nodes in utilities, as some queries can finish with
the same query ID even if different values are stored in a query,
still don't show up as normalized in pg_stat_statements, so the
current state of HEAD is not good, though you would need to use the
same object name to a conflict for most of them.  So that's my mistake
here with 3db72eb.  If folks think that we'd better have a revert of
this automated query jumbling facility based on this argument, that
would be fine for me, as well.  The main case I have noticed in this
area is EXPLAIN, by the way.  Note that it is actually easy to move to
the ~15 approach of having a query ID depending on the Const node
values for DDLs, by having a custom implementation in
queryjumblefuncs.c for Const nodes, where we apply the constant value
and don't store a location for normalization if a query has a utility
once this information is stored in a JumbleState.

This rule influences various DDLs, as well, once it gets applied
across the board, and it's been some work to identify all of them, but
I think that I have caught them all as the regression database offers
all the possible patterns:
- CREATE VIEW, CTAS, CREATE MATERIALIZED VIEW which have Const nodes
depending on their attached queries, for various clauses.
- ALTER TABLE/INDEX/FOREIGN with DEFAULT, SET components.
- CREATE TABLE with partition bounds.
- BEGIN and ABORT, with transaction commands getting grouped
together.

The attached patch set includes as a set of regression tests for
pg_stat_statements for *all* the utility queries that have either
Const or A_Const nodes, so as one can see the effect that all this
stuff has.  This is based on a diff of the contents of
pg_stat_statements on the regression database once all these
normalization rules are applied.

Compilation of a Const can also be made depending on the type node.
However, all that makes no sense if applying the same normalization
rules to all the queries across the board, because all the queries
would follow the same rules.  That's the critical bit IMO.  From what
I get, the bloat of pg_stat_statements for all utilities is something
that would be helpful for all such queries, still different things
could be done on a per-node basis.  Perhaps this is too aggressive as
it is and people don't like it, though, so feedback is welcome.  I'd
like to think that maximizing grouping is nice though, because it
leads to no actual loss of information on the workload pattern for the
queries involved, AFAIU.  This sentence may be overoptimistic.

So, attached is a patch set, that does the following:
- 0001 is a refactoring of the regression tests of
pg_stat_statements by splitting a bit the tests.  I bumped into that
while getting confused at how the tests are now when it comes to the
handling of utilities and track_planning, where these tests silently
rely on other parts of the same file with different GUC settings.
This refactoring is useful on its own, IMO, and the tests show the
same output as previously.
- 0002 is the addition of tests in pg_stat_statements for all the DDL
and utility patterns that make use of Const and A_Const nodes.  Even
if query jumbling of utilities is done through their text string or
their nodes, this is also useful.
- 0003 is the code of the feature, that switches pg_stat_statements to
properly normalize utility