> > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.byc...@gmail.com> > > > wrote: > > > > I wanted to keep the patch small and simple, and fit to practical > > > > needs. This patch is supposed to provide tuning assistance, catching > > > > an io heavy query in commit-bound situation. > > > > Total WAL usage per DB can be assessed rather easily using other means. > > > > Let's get this change into the codebase and then work on connecting > > > > WAL usage to (auto)vacuum stats. > > > > > > I agree that having a view of the full activity is a way bigger scope, > > > so it could be done later (and at this point in pg14), but I'm still > > > hoping that we can get insight of other backend WAL activity, such as > > > autovacuum, in pg13. > > > > How do you think this information should be exposed? Via the > > pg_stat_statement? > > That's unlikely, since autovacuum won't trigger any hook. I was > thinking on some new view for pgstats, similarly to the example I > showed previously. The implementation is straightforward, although > pg_stat_database is maybe not the best choice here.
After extensive thinking and some code diving, I did not manage to come up with a sane idea on how to expose data about autovacuum WAL usage. Must be the flu. > > Anyways, I believe this change could be bigger than FPI. I propose to > > plan a separate patch for it, or even add it to the TODO after the > > core patch of wal usage is merged. > > Just in case, if the problem is a lack of time, I'd be happy to help > on that if needed. Otherwise, I'll definitely not try to block any > progress for the feature as proposed. Please feel free to work on any extension of this patch idea. I lack both time and knowledge to do it all by myself. > > Please expect a new patch version next week, with FPI counters added. Please find attached patch version 003, with FP writes and minor corrections. Hope i use attachment versioning as expected in this group :) Test had been reworked, and I believe it should be stable now, the part which checks WAL is written and there is a correlation between affected rows and WAL records. I still have no idea how to test full-page writes against regular updates, it seems very unstable. Please share ideas if any. Thanks!
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4fa446ffa4..c975aa0dc7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -42,6 +42,7 @@ #include "commands/progress.h" #include "commands/tablespace.h" #include "common/controldata_utils.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "pgstat.h" @@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata, ProcLastRecPtr = StartPos; XactLastRecEnd = EndPos; + /* Provide WAL update data to the instrumentation */ + if (inserted) + { + pgWalUsage.wal_bytes += rechdr->xl_tot_len; + pgWalUsage.wal_records++; + } + return EndPos; } diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index 2fa0a7f667..1f71cc0a76 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -25,6 +25,7 @@ #include "access/xloginsert.h" #include "catalog/pg_control.h" #include "common/pg_lzcompress.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "replication/origin.h" @@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, */ bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; + /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++; + /* * Construct XLogRecData entries for the page content. */ diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index a753d6efa0..017367878f 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -62,6 +62,7 @@ #define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007) #define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008) #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009) +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A) #define PARALLEL_TUPLE_QUEUE_SIZE 65536 @@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, char *pstmt_space; char *paramlistinfo_space; BufferUsage *bufusage_space; + WalUsage *walusage_space; SharedExecutorInstrumentation *instrumentation = NULL; SharedJitInstrumentation *jit_instrumentation = NULL; int pstmt_len; @@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, mul_size(sizeof(BufferUsage), pcxt->nworkers)); shm_toc_estimate_keys(&pcxt->estimator, 1); + /* + * Same thing for WalUsage. + */ + shm_toc_estimate_chunk(&pcxt->estimator, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_estimate_keys(&pcxt->estimator, 1); + /* Estimate space for tuple queues. */ shm_toc_estimate_chunk(&pcxt->estimator, mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers)); @@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space); pei->buffer_usage = bufusage_space; + /* Same for WalUsage. */ + walusage_space = shm_toc_allocate(pcxt->toc, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); + pei->wal_usage = walusage_space; + /* Set up the tuple queues that the workers will write into. */ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false); @@ -1113,7 +1128,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei) * finish, or we might get incomplete data.) */ for (i = 0; i < nworkers; i++) - InstrAccumParallelQuery(&pei->buffer_usage[i]); + InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]); pei->finished = true; } @@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) { FixedParallelExecutorState *fpes; BufferUsage *buffer_usage; + WalUsage *wal_usage; DestReceiver *receiver; QueryDesc *queryDesc; SharedExecutorInstrumentation *instrumentation; @@ -1408,7 +1424,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) /* Report buffer usage during parallel execution. */ buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false); - InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]); + wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false); + InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], + &wal_usage[ParallelWorkerNumber]); /* Report instrumentation data if any instrumentation options are set. */ if (instrumentation != NULL) diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index bc1d42bf64..4bcb06f6e1 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -24,6 +24,11 @@ static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); static void BufferUsageAccumDiff(BufferUsage *dst, const BufferUsage *add, const BufferUsage *sub); +WalUsage pgWalUsage; +static WalUsage save_pgWalUsage; + +static void WalUsageAdd(WalUsage *dst, WalUsage *add); +static void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub); /* Allocate new instrumentation structure(s) */ Instrumentation * @@ -33,15 +38,17 @@ InstrAlloc(int n, int instrument_options) /* initialize all fields to zeroes, then modify as needed */ instr = palloc0(n * sizeof(Instrumentation)); - if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER)) + if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL)) { bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0; + bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0; int i; for (i = 0; i < n; i++) { instr[i].need_bufusage = need_buffers; + instr[i].need_walusage = need_wal; instr[i].need_timer = need_timer; } } @@ -55,6 +62,7 @@ InstrInit(Instrumentation *instr, int instrument_options) { memset(instr, 0, sizeof(Instrumentation)); instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0; + instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0; instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0; } @@ -69,6 +77,9 @@ InstrStartNode(Instrumentation *instr) /* save buffer usage totals at node entry, if needed */ if (instr->need_bufusage) instr->bufusage_start = pgBufferUsage; + + if (instr->need_walusage) + instr->walusage_start = pgWalUsage; } /* Exit from a plan node */ @@ -97,6 +108,10 @@ InstrStopNode(Instrumentation *instr, double nTuples) BufferUsageAccumDiff(&instr->bufusage, &pgBufferUsage, &instr->bufusage_start); + if (instr->need_walusage) + WalUsageAccumDiff(&instr->walusage, + &pgWalUsage, &instr->walusage_start); + /* Is this the first tuple of this cycle? */ if (!instr->running) { @@ -160,6 +175,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add) /* Add delta of buffer usage since entry to node's totals */ if (dst->need_bufusage) BufferUsageAdd(&dst->bufusage, &add->bufusage); + + if (dst->need_walusage) + WalUsageAdd(&dst->walusage, &add->walusage); } /* note current values during parallel executor startup */ @@ -167,21 +185,25 @@ void InstrStartParallelQuery(void) { save_pgBufferUsage = pgBufferUsage; + save_pgWalUsage = pgWalUsage; } /* report usage after parallel executor shutdown */ void -InstrEndParallelQuery(BufferUsage *result) +InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage) { - memset(result, 0, sizeof(BufferUsage)); - BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage); + memset(bufusage, 0, sizeof(BufferUsage)); + memset(walusage, 0, sizeof(WalUsage)); + BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage); + WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage); } /* accumulate work done by workers in leader's stats */ void -InstrAccumParallelQuery(BufferUsage *result) +InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage) { - BufferUsageAdd(&pgBufferUsage, result); + BufferUsageAdd(&pgBufferUsage, bufusage); + WalUsageAdd(&pgWalUsage, walusage); } /* dst += add */ @@ -223,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst, INSTR_TIME_ACCUM_DIFF(dst->blk_write_time, add->blk_write_time, sub->blk_write_time); } + +/* helper functions for WAL usage accumulation */ +static void +WalUsageAdd(WalUsage *dst, WalUsage *add) +{ + dst->wal_bytes += add->wal_bytes; + dst->wal_records += add->wal_records; + dst->wal_fp_records += add->wal_fp_records; +} + +static void +WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub) +{ + dst->wal_bytes += add->wal_bytes - sub->wal_bytes; + dst->wal_records += add->wal_records - sub->wal_records; + dst->wal_fp_records += add->wal_fp_records - sub->wal_fp_records; +} diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h index 17d07cf020..1cc5b524fd 100644 --- a/src/include/executor/execParallel.h +++ b/src/include/executor/execParallel.h @@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo PlanState *planstate; /* plan subtree we're running in parallel */ ParallelContext *pcxt; /* parallel context we're using */ BufferUsage *buffer_usage; /* points to bufusage area in DSM */ + WalUsage *wal_usage; /* walusage area in DSM */ SharedExecutorInstrumentation *instrumentation; /* optional */ struct SharedJitInstrumentation *jit_instrumentation; /* optional */ dsa_area *area; /* points to DSA area in DSM */ diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h index f48d46aede..f79fac8f8c 100644 --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -32,12 +32,20 @@ typedef struct BufferUsage instr_time blk_write_time; /* time spent writing */ } BufferUsage; +typedef struct WalUsage +{ + long wal_records; /* # of wal records produced */ + long wal_fp_records; /* # of full page wal records produced */ + long wal_bytes; /* size of wal records produced */ +} WalUsage; + /* Flag bits included in InstrAlloc's instrument_options bitmask */ typedef enum InstrumentOption { INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */ INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */ INSTRUMENT_ROWS = 1 << 2, /* needs row count */ + INSTRUMENT_WAL = 1 << 3, /* needs wal usage */ INSTRUMENT_ALL = PG_INT32_MAX } InstrumentOption; @@ -46,6 +54,7 @@ typedef struct Instrumentation /* Parameters set at node creation: */ bool need_timer; /* true if we need timer data */ bool need_bufusage; /* true if we need buffer usage data */ + bool need_walusage; /* true if we need wal usage data */ /* Info about current plan cycle: */ bool running; /* true if we've completed first tuple */ instr_time starttime; /* start time of current iteration of node */ @@ -53,6 +62,7 @@ typedef struct Instrumentation double firsttuple; /* time for first tuple of this cycle */ double tuplecount; /* # of tuples emitted so far this cycle */ BufferUsage bufusage_start; /* buffer usage at start */ + WalUsage walusage_start; /* wal usage at start */ /* Accumulated statistics across all completed cycles: */ double startup; /* total startup time (in seconds) */ double total; /* total time (in seconds) */ @@ -62,6 +72,7 @@ typedef struct Instrumentation double nfiltered1; /* # of tuples removed by scanqual or joinqual */ double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ + WalUsage walusage; /* total wal usage */ } Instrumentation; typedef struct WorkerInstrumentation @@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation } WorkerInstrumentation; extern PGDLLIMPORT BufferUsage pgBufferUsage; +extern PGDLLIMPORT WalUsage pgWalUsage; extern Instrumentation *InstrAlloc(int n, int instrument_options); extern void InstrInit(Instrumentation *instr, int instrument_options); @@ -79,7 +91,7 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples); extern void InstrEndLoop(Instrumentation *instr); extern void InstrAggNode(Instrumentation *dst, Instrumentation *add); extern void InstrStartParallelQuery(void); -extern void InstrEndParallelQuery(BufferUsage *result); -extern void InstrAccumParallelQuery(BufferUsage *result); +extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage); +extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage); #endif /* INSTRUMENT_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e216de9570..88aed4c652 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2632,6 +2632,7 @@ WalSndCtlData WalSndSendDataCallback WalSndState WalTimeSample +WalUsage WalWriteMethod Walfile WindowAgg
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 80042a0905..081f997d70 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,7 +6,8 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.6--1.7.sql \ +DATA = pg_stat_statements--1.4.sql \ + pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 6787ec1efd..8ed985b2b6 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -195,20 +195,123 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5); 3 | c (8 rows) -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows --------------------------------------------------------------+-------+------ - DELETE FROM test WHERE a > $1 | 1 | 1 - INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 - INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 - SELECT * FROM test ORDER BY a | 1 | 12 - SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 - SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 - SELECT pg_stat_statements_reset() | 1 | 1 - UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 - UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 +-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL +SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_write_bytes | wal_write_records +-------------------------------------------------------------+-------+------+-----------------+------------------- + DELETE FROM test WHERE a > $1 | 1 | 1 | 0 | 0 + INSERT INTO test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | 0 | 0 + INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 | 0 | 0 + SELECT * FROM test ORDER BY a | 1 | 12 | 0 | 0 + SELECT * FROM test WHERE a > $1 ORDER BY a | 2 | 4 | 0 | 0 + SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 + UPDATE test SET b = $1 WHERE a = $2 | 6 | 6 | 0 | 0 + UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 | 0 | 0 (9 rows) +-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT; +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +-- +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-------------------------- + +(1 row) + +-- utility "create table" should not be shown +CREATE TABLE pgss_test (a int, b char(20)); +INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa'); +UPDATE pgss_test SET b = 'bbb' WHERE a > 7; +DELETE FROM pgss_test WHERE a > 9; +-- explicit transaction +BEGIN; +UPDATE pgss_test SET b = '111' WHERE a = 1 ; +COMMIT; +BEGIN \; +UPDATE pgss_test SET b = '222' WHERE a = 2 \; +COMMIT ; +UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ; +BEGIN \; +UPDATE pgss_test SET b = '555' WHERE a = 5 \; +UPDATE pgss_test SET b = '666' WHERE a = 6 \; +COMMIT ; +-- many INSERT values +INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c'); +-- SELECT with constants +SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ; + a | b +---+---------------------- + 6 | 666 + 7 | aaa + 8 | bbb + 9 | bbb +(4 rows) + +SELECT * + FROM pgss_test + WHERE a > 9 + ORDER BY a ; + a | b +---+--- +(0 rows) + +-- SELECT without constants +SELECT * FROM pgss_test ORDER BY a; + a | b +---+---------------------- + 1 | a + 1 | 111 + 2 | b + 2 | 222 + 3 | c + 3 | 333 + 4 | 444 + 5 | 555 + 6 | 666 + 7 | aaa + 8 | bbb + 9 | bbb +(12 rows) + +-- SELECT with IN clause +SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5); + a | b +---+---------------------- + 1 | 111 + 2 | 222 + 3 | 333 + 4 | 444 + 5 | 555 + 1 | a + 2 | b + 3 | c +(8 rows) + +-- DROP test table +SET pg_stat_statements.track_utility = TRUE; +DROP TABLE pgss_test; +SET pg_stat_statements.track_utility = FALSE; +-- SELCT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, wal_write_bytes > 0 as wal_bytes_written, wal_write_records > 0 as wal_records_written, wal_write_records = rows as wal_records_as_rows FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes_written | wal_records_written | wal_records_as_rows +------------------------------------------------------------------+-------+------+-------------------+---------------------+--------------------- + DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t + DROP TABLE pgss_test | 1 | 0 | t | t | f + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t + SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f + SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f + SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t + UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t + UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t +(11 rows) + -- -- pg_stat_statements.track = none -- @@ -287,13 +390,13 @@ SELECT PLUS_ONE(10); 11 (1 row) -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_write_bytes | wal_write_records +-----------------------------------+-------+------+-----------------+------------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows) -- diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql new file mode 100644 index 0000000000..f8b79f2277 --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -0,0 +1,50 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(boolean); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements(IN showtext boolean, + OUT userid oid, + OUT dbid oid, + OUT queryid bigint, + OUT query text, + OUT calls int8, + OUT total_time float8, + OUT min_time float8, + OUT max_time float8, + OUT mean_time float8, + OUT stddev_time float8, + OUT rows int8, + OUT shared_blks_hit int8, + OUT shared_blks_read int8, + OUT shared_blks_dirtied int8, + OUT shared_blks_written int8, + OUT local_blks_hit int8, + OUT local_blks_read int8, + OUT local_blks_dirtied int8, + OUT local_blks_written int8, + OUT temp_blks_read int8, + OUT temp_blks_written int8, + OUT blk_read_time float8, + OUT blk_write_time float8, + OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE; + +CREATE VIEW pg_stat_statements AS + SELECT * FROM pg_stat_statements(true); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 20dc8c605b..1c256fc395 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -120,7 +120,8 @@ typedef enum pgssVersion PGSS_V1_0 = 0, PGSS_V1_1, PGSS_V1_2, - PGSS_V1_3 + PGSS_V1_3, + PGSS_V1_4 } pgssVersion; /* @@ -161,6 +162,9 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ + int64 wal_bytes_written; /* total amount of wal bytes written */ + int64 wal_records_written; /* # of wal records written */ + int64 wal_fp_records_written; /* # of full page wal records written */ double blk_read_time; /* time spent reading, in msec */ double blk_write_time; /* time spent writing, in msec */ double usage; /* usage factor */ @@ -293,6 +297,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset); PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7); PG_FUNCTION_INFO_V1(pg_stat_statements_1_2); PG_FUNCTION_INFO_V1(pg_stat_statements_1_3); +PG_FUNCTION_INFO_V1(pg_stat_statements_1_4); PG_FUNCTION_INFO_V1(pg_stat_statements); static void pgss_shmem_startup(void); @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, const BufferUsage *bufusage, + const WalUsage* walusage, pgssJumbleState *jstate); static void pg_stat_statements_internal(FunctionCallInfo fcinfo, pgssVersion api_version, @@ -841,6 +847,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) 0, 0, NULL, + NULL, &jstate); } @@ -944,6 +951,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc) queryDesc->totaltime->total * 1000.0, /* convert to msec */ queryDesc->estate->es_processed, &queryDesc->totaltime->bufusage, + &queryDesc->totaltime->walusage, NULL); } @@ -989,7 +997,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, BufferUsage bufusage_start, bufusage; + WalUsage walusage_start, + walusage; + bufusage_start = pgBufferUsage; + walusage_start = pgWalUsage; INSTR_TIME_SET_CURRENT(start); nested_level++; @@ -1041,6 +1053,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, bufusage.blk_write_time = pgBufferUsage.blk_write_time; INSTR_TIME_SUBTRACT(bufusage.blk_write_time, bufusage_start.blk_write_time); + walusage.wal_bytes = + pgWalUsage.wal_bytes - walusage_start.wal_bytes; + walusage.wal_records = + pgWalUsage.wal_records - walusage_start.wal_records; + walusage.wal_fp_records = + pgWalUsage.wal_fp_records - walusage_start.wal_fp_records; + pgss_store(queryString, 0, /* signal that it's a utility stmt */ pstmt->stmt_location, @@ -1048,6 +1067,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, INSTR_TIME_GET_MILLISEC(duration), rows, &bufusage, + &walusage, NULL); } else @@ -1083,13 +1103,14 @@ pgss_hash_string(const char *str, int len) * * If jstate is not NULL then we're trying to create an entry for which * we have no statistics as yet; we just want to record the normalized - * query string. total_time, rows, bufusage are ignored in this case. + * query string. Time and usage are ignored in this case. */ static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, const BufferUsage *bufusage, + const WalUsage* walusage, pgssJumbleState *jstate) { pgssHashKey key; @@ -1281,6 +1302,9 @@ pgss_store(const char *query, uint64 queryId, e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time); e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time); e->counters.usage += USAGE_EXEC(total_time); + e->counters.wal_bytes_written += walusage->wal_bytes; + e->counters.wal_records_written += walusage->wal_records; + e->counters.wal_fp_records_written += walusage->wal_fp_records; SpinLockRelease(&e->mutex); } @@ -1328,7 +1352,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) #define PG_STAT_STATEMENTS_COLS_V1_1 18 #define PG_STAT_STATEMENTS_COLS_V1_2 19 #define PG_STAT_STATEMENTS_COLS_V1_3 23 -#define PG_STAT_STATEMENTS_COLS 23 /* maximum of above */ +#define PG_STAT_STATEMENTS_COLS_V1_4 26 +#define PG_STAT_STATEMENTS_COLS 26 /* maximum of above */ /* * Retrieve statement statistics. @@ -1340,6 +1365,15 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) * expected API version is identified by embedding it in the C name of the * function. Unfortunately we weren't bright enough to do that for 1.1. */ +Datum pg_stat_statements_1_4(PG_FUNCTION_ARGS) +{ + bool showtext = PG_GETARG_BOOL(0); + + pg_stat_statements_internal(fcinfo, PGSS_V1_4, showtext); + + return (Datum)0; +} + Datum pg_stat_statements_1_3(PG_FUNCTION_ARGS) { @@ -1445,6 +1479,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo, if (api_version != PGSS_V1_3) elog(ERROR, "incorrect number of output arguments"); break; + case PG_STAT_STATEMENTS_COLS_V1_4: + if (api_version != PGSS_V1_4) + elog(ERROR, "incorrect number of output arguments"); + break; default: elog(ERROR, "incorrect number of output arguments"); } @@ -1641,11 +1679,18 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo, values[i++] = Float8GetDatumFast(tmp.blk_read_time); values[i++] = Float8GetDatumFast(tmp.blk_write_time); } + if (api_version >= PGSS_V1_4) + { + values[i++] = Int64GetDatumFast(tmp.wal_bytes_written); + values[i++] = Int64GetDatumFast(tmp.wal_records_written); + values[i++] = Int64GetDatumFast(tmp.wal_fp_records_written); + } Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 : api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 : api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 : api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 : + api_version == PGSS_V1_4 ? PG_STAT_STATEMENTS_COLS_V1_4 : -1 /* fail if you forget to update this assert */ )); tuplestore_putvalues(tupstore, tupdesc, values, nulls); diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control index 14cb422354..7fb20df886 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.control +++ b/contrib/pg_stat_statements/pg_stat_statements.control @@ -1,5 +1,5 @@ # pg_stat_statements extension comment = 'track execution statistics of all SQL statements executed' -default_version = '1.7' +default_version = '1.8' module_pathname = '$libdir/pg_stat_statements' relocatable = true diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql index 8b527070d4..5c45d85ba3 100644 --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql @@ -99,7 +99,65 @@ SELECT * FROM test ORDER BY a; -- SELECT with IN clause SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5); -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +-- SELECT check WAL usage stats to confirm temp tables do not get stored in WAL +SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + +-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT; + +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +-- +SELECT pg_stat_statements_reset(); + +-- utility "create table" should not be shown +CREATE TABLE pgss_test (a int, b char(20)); + +INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa'); +UPDATE pgss_test SET b = 'bbb' WHERE a > 7; +DELETE FROM pgss_test WHERE a > 9; + +-- explicit transaction +BEGIN; +UPDATE pgss_test SET b = '111' WHERE a = 1 ; +COMMIT; + +BEGIN \; +UPDATE pgss_test SET b = '222' WHERE a = 2 \; +COMMIT ; + +UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ; + +BEGIN \; +UPDATE pgss_test SET b = '555' WHERE a = 5 \; +UPDATE pgss_test SET b = '666' WHERE a = 6 \; +COMMIT ; + +-- many INSERT values +INSERT INTO pgss_test (a, b) VALUES (1, 'a'), (2, 'b'), (3, 'c'); + +-- SELECT with constants +SELECT * FROM pgss_test WHERE a > 5 ORDER BY a ; + +SELECT * + FROM pgss_test + WHERE a > 9 + ORDER BY a ; + +-- SELECT without constants +SELECT * FROM pgss_test ORDER BY a; + +-- SELECT with IN clause +SELECT * FROM pgss_test WHERE a IN (1, 2, 3, 4, 5); + +-- DROP test table +SET pg_stat_statements.track_utility = TRUE; +DROP TABLE pgss_test; +SET pg_stat_statements.track_utility = FALSE; + +-- SELCT usage data, check WAL usage is reported, wal_write_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, wal_write_bytes > 0 as wal_bytes_written, wal_write_records > 0 as wal_records_written, wal_write_records = rows as wal_records_as_rows FROM pg_stat_statements ORDER BY query COLLATE "C"; -- -- pg_stat_statements.track = none @@ -144,7 +202,7 @@ $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL; SELECT PLUS_ONE(8); SELECT PLUS_ONE(10); -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +SELECT query, calls, rows, wal_write_bytes, wal_write_records FROM pg_stat_statements ORDER BY query COLLATE "C"; -- -- pg_stat_statements.track = all diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index 26bb82da4a..af1c276752 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -221,6 +221,42 @@ </entry> </row> + <row> + <entry><structfield>wal_write_bytes</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Amount of WAL bytes added by the statement + </entry> + </row> + + <row> + <entry><structfield>wal_write_records</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Count of WAL records added by the statement + </entry> + </row> + + <row> + <entry><structfield>wal_write_bytes</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Amount of WAL bytes added by the statement + </entry> + </row> + + <row> + <entry><structfield>wal_write_records</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Count of WAL records added by the statement + </entry> + </row> + </tbody> </tgroup> </table>