On 28.01.2025 20:21, Sami Imseih wrote:
All the changes mentioned above are included in the v13 patch. Since the
patch status is 'Ready for Committer,' I believe it is now better for
upstream inclusion, with improved details in tests and documentation. Do
you have any further suggestions?
I am not quite clear on the sample_1.out. I do like the idea of separating
the sample tests, but I was thinking of something a bit more simple.
What do you think of my attached, sampling.sql, test? It tests sample
rate in both
simple and extended query protocols and for both top level and
nested levels?


That sounds great! I've added your sample.sql file to my v14 patch. However, I was focused on testing sample_rate values between 0 and 1. The approach I came up with was using the sample{_1}.out files.  I’ve removed the test involving those files for now, but if the committer prefers to keep it, I can reintroduce them.



If anyone has the capability to run this benchmark on machines with more
CPUs or with different queries, it would be nice. I’d appreciate any
suggestions or feedback.
I wanted to share some additional benchmarks I ran as well
on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
with 16GB of shared_buffers. I also attached the benchmark.sh
script used to generate the output.
The benchmark is running the select-only pgbench workload,
so we have a single heavily contentious entry, which is the
worst case.

The test shows that the spinlock (SpinDelay waits)
becomes an issue at high connection counts and will
become worse on larger machines. A sample_rate going from
1 to .75 shows a 60% improvement; but this is on a single
contentious entry. Most workloads will likely not see this type
of improvement. I also could not really observe
this type of difference on smaller machines ( i.e. 32 vCPUs),
as expected.

## init
pgbench -i -s500

### 192 connections
pgbench -c192 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 484338.769799 (without initial connection time)
waits
-----
   11107  SpinDelay
    9568  CPU
     929  ClientRead
      13  DataFileRead
       3  BufferMapping

sample_rate = .75
tps = 909547.562124 (without initial connection time)
waits
-----
   12079  CPU
    4781  SpinDelay
    2100  ClientRead

sample_rate = .5
tps = 1028594.555273 (without initial connection time)
waits
-----
   13253  CPU
    3378  ClientRead
     174  SpinDelay

sample_rate = .25
tps = 1019507.126313 (without initial connection time)
waits
-----
   13397  CPU
    3423  ClientRead

sample_rate = 0
tps = 1015425.288538 (without initial connection time)
waits
-----
   13106  CPU
    3502  ClientRead

### 32 connections
pgbench -c32 -j20 -S -Mprepared -T120 --progress 10

sample_rate = 1
tps = 620667.049565 (without initial connection time)
waits
-----
    1782  CPU
     560  ClientRead

sample_rate = .75
tps = 620663.131347 (without initial connection time)
waits
-----
    1736  CPU
     554  ClientRead

sample_rate = .5
tps = 624094.688239 (without initial connection time)
waits
-----
    1741  CPU
     648  ClientRead

sample_rate = .25
tps = 628638.538204 (without initial connection time)
waits
-----
    1702  CPU
     576  ClientRead

sample_rate = 0
tps = 630483.464912 (without initial connection time)
waits
-----
    1638  CPU
     574  ClientRead

Regards,

Sami


Thank you so much for benchmarking this on a pretty large machine with a large number of CPUs. The results look fantastic, and I truly appreciate your effort.

BWT, I realized that the 'sampling' test needs to be added not only to the Makefile but also to meson.build. I've included that in the v14 patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
From 7c9d45325e29a65259740d5255d39f9f57ee6fba Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.com>
Date: Tue, 28 Jan 2025 23:04:34 +0300
Subject: [PATCH] Allow setting sample rate for pg_stat_statements

New configuration parameter pg_stat_statements.sample_rate makes it
possible to track just a fraction of the queries meeting the configured
threshold, to reduce the amount of tracking.
---
 contrib/pg_stat_statements/Makefile           |   2 +-
 .../pg_stat_statements/expected/sampling.out  | 174 ++++++++++++++++++
 contrib/pg_stat_statements/meson.build        |   1 +
 .../pg_stat_statements/pg_stat_statements.c   |  59 +++++-
 contrib/pg_stat_statements/sql/sampling.sql   |  50 +++++
 doc/src/sgml/pgstatstatements.sgml            |  19 ++
 6 files changed, 297 insertions(+), 8 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/sampling.out
 create mode 100644 contrib/pg_stat_statements/sql/sampling.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 241c02587b..b70bdfaf2d 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -20,7 +20,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS))
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = select dml cursors utility level_tracking planning \
 	user_activity wal entry_timestamp privileges extended \
-	parallel cleanup oldextversions
+	parallel sampling cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/sampling.out b/contrib/pg_stat_statements/expected/sampling.out
new file mode 100644
index 0000000000..2204215f64
--- /dev/null
+++ b/contrib/pg_stat_statements/expected/sampling.out
@@ -0,0 +1,174 @@
+--
+-- sample statements
+--
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+ pg_stat_statements.track 
+--------------------------
+ top
+(1 row)
+
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+ int 
+-----
+   1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+       query        | calls 
+--------------------+-------
+ SELECT $1 AS "int" |     1
+(1 row)
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 1 \parse stmt
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query | calls 
+-------+-------
+(0 rows)
+
+SET pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query   | calls 
+-----------+-------
+ SELECT $1 |     1
+(1 row)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+DEALLOCATE stmt;
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.sample_rate = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT 1;
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                       query                        
+----------+-------+----------------------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     2 | SELECT $1
+ t        |     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+ t        |     2 | SET pg_stat_statements.sample_rate = $1
+(4 rows)
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SET pg_stat_statements.sample_rate = 1;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SET pg_stat_statements.sample_rate = 0;
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+\bind_named stmt \g
+ QUERY PLAN 
+------------
+ Result
+(1 row)
+
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+ toplevel | calls |                  query                  
+----------+-------+-----------------------------------------
+ t        |     2 | EXPLAIN (COSTS OFF) SELECT $1
+ f        |     3 | SELECT $1
+ t        |     1 | SET pg_stat_statements.sample_rate = $1
+(3 rows)
+
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 4446af58c5..351354a6a5 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -54,6 +54,7 @@ tests += {
       'privileges',
       'extended',
       'parallel',
+      'sampling',
       'cleanup',
       'oldextversions',
     ],
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index bebf8134eb..afd0537713 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -50,6 +50,7 @@
 #include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/int.h"
+#include "common/pg_prng.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "jit/jit.h"
@@ -256,6 +257,9 @@ typedef struct pgssSharedState
 /* Current nesting depth of planner/ExecutorRun/ProcessUtility calls */
 static int	nesting_level = 0;
 
+/* Is the current top-level query to be sampled? */
+static bool is_query_sampled = false;
+
 /* Saved hook values */
 static shmem_request_hook_type prev_shmem_request_hook = NULL;
 static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
@@ -294,12 +298,14 @@ static bool pgss_track_utility = true;	/* whether to track utility commands */
 static bool pgss_track_planning = false;	/* whether to track planning
 											 * duration */
 static bool pgss_save = true;	/* whether to save stats across shutdown */
+static double pgss_sample_rate = 1.0; /* fraction of statements to track */
 
 
-#define pgss_enabled(level) \
+#define pgss_enabled(level, skip_sampling_check) \
 	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
+	current_query_sampled(skip_sampling_check))
 
 #define record_gc_qtexts() \
 	do { \
@@ -373,6 +379,7 @@ static char *generate_normalized_query(JumbleState *jstate, const char *query,
 static void fill_in_constant_lengths(JumbleState *jstate, const char *query,
 									 int query_loc);
 static int	comp_location(const void *a, const void *b);
+static bool	current_query_sampled(bool skip);
 
 
 /*
@@ -414,6 +421,19 @@ _PG_init(void)
 							NULL,
 							NULL);
 
+	DefineCustomRealVariable("pg_stat_statements.sample_rate",
+							 "Fraction of queries to track.",
+							 NULL,
+							 &pgss_sample_rate,
+							 1.0,
+							 0.0,
+							 1.0,
+							 PGC_SUSET,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
 	DefineCustomEnumVariable("pg_stat_statements.track",
 							 "Selects which statements are tracked by pg_stat_statements.",
 							 NULL,
@@ -836,7 +856,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 		prev_post_parse_analyze_hook(pstate, query, jstate);
 
 	/* Safety check... */
-	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level))
+	if (!pgss || !pgss_hash || !pgss_enabled(nesting_level, true))
 		return;
 
 	/*
@@ -894,7 +914,7 @@ pgss_planner(Query *parse,
 	 * pgss_store needs it.  We also ignore query without queryid, as it would
 	 * be treated as a utility statement, which may not be the case.
 	 */
-	if (pgss_enabled(nesting_level)
+	if (pgss_enabled(nesting_level, false)
 		&& pgss_track_planning && query_string
 		&& parse->queryId != UINT64CONST(0))
 	{
@@ -999,7 +1019,8 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 * counting of optimizable statements that are directly contained in
 	 * utility statements.
 	 */
-	if (pgss_enabled(nesting_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+	if (pgss_enabled(nesting_level, false) &&
+		queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -1068,7 +1089,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
 	uint64		queryId = queryDesc->plannedstmt->queryId;
 
 	if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
-		pgss_enabled(nesting_level))
+		pgss_enabled(nesting_level, false))
 	{
 		/*
 		 * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1111,7 +1132,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	uint64		saved_queryId = pstmt->queryId;
 	int			saved_stmt_location = pstmt->stmt_location;
 	int			saved_stmt_len = pstmt->stmt_len;
-	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level);
+	bool		enabled = pgss_track_utility && pgss_enabled(nesting_level, false);
 
 	/*
 	 * Force utility statements to get queryId zero.  We do this even in cases
@@ -3011,3 +3032,27 @@ comp_location(const void *a, const void *b)
 
 	return pg_cmp_s32(l, r);
 }
+
+/*
+ * Determine whether the current query should be sampled.
+ *
+ * In some cases, sampling may need to be skipped entirely, such as during
+ * query parsing for normalization.
+ *
+ * At the beginning of each top-level statement, decide whether we'll
+ * sample this statement. If nested-statement tracking is enabled,
+ * either all nested statements will be tracked or none will.
+ */
+static bool
+current_query_sampled(bool skip)
+{
+	if (skip)
+		return true;
+
+	if (nesting_level == 0)
+		is_query_sampled = pgss_sample_rate != 0.0 &&
+			(pgss_sample_rate == 1.0 ||
+			pg_prng_double(&pg_global_prng_state) <= pgss_sample_rate);
+
+	return is_query_sampled;
+}
\ No newline at end of file
diff --git a/contrib/pg_stat_statements/sql/sampling.sql b/contrib/pg_stat_statements/sql/sampling.sql
new file mode 100644
index 0000000000..b09f45991b
--- /dev/null
+++ b/contrib/pg_stat_statements/sql/sampling.sql
@@ -0,0 +1,50 @@
+--
+-- sample statements
+--
+
+-- top-level tracking - simple query protocol
+SHOW pg_stat_statements.track;
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.sample_rate = 1.0;
+SELECT 1 AS "int";
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+
+-- top-level tracking - extended query protocol
+SET pg_stat_statements.sample_rate = 0.0;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SELECT 1 \parse stmt
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SET pg_stat_statements.sample_rate = 1.0;
+\bind_named stmt \g
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+DEALLOCATE stmt;
+
+-- nested tracking - simple query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.sample_rate = 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SET pg_stat_statements.sample_rate = 0;
+EXPLAIN (COSTS OFF) SELECT 1;
+EXPLAIN (COSTS OFF) SELECT 1;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
+
+-- nested tracking - extended query protocol
+SET pg_stat_statements.track = "all";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+SET pg_stat_statements.sample_rate = 1;
+EXPLAIN (COSTS OFF) SELECT 1; \parse stmt
+\bind_named stmt \g
+\bind_named stmt \g
+SET pg_stat_statements.sample_rate = 0;
+\bind_named stmt \g
+\bind_named stmt \g
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C";
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 501b468e9a..cd3fdaeffe 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -936,6 +936,25 @@
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term>
+     <varname>pg_stat_statements.sample_rate</varname> (<type>real</type>)
+     <indexterm>
+      <primary><varname>pg_stat_statements.sample_rate</varname> configuration parameter</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      <varname>pg_stat_statements.sample_rate</varname> causes pg_stat_statements to only
+      track a fraction of the statements in each session. The default is <literal>1</literal>,
+      meaning track all the queries. Setting this to <literal>0</literal> disables sampled statements
+      tracking, the same as setting <varname>pg_stat_statements.track</varname> to <literal>none</literal>.
+      In case of nested statements, either all will be tracked or none. Only superusers can change this setting.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term>
      <varname>pg_stat_statements.save</varname> (<type>boolean</type>)
-- 
2.34.1

Reply via email to