On Thu, Apr 02, 2020 at 02:32:07PM +0530, Amit Kapila wrote: > On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > > > On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote: > > > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > > > > > > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > > > > 3. Doing some testing with and without parallelism to ensure WAL usage > > > > > data is correct would be great and if possible, share the results? > > > > > > > > > > > > I just saw that Dilip did some testing, but just in case here is some > > > > additional one > > > > > > > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from > > > > pg_stat_statements where query ilike '%vacuum%'; > > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > > ------------------------+-------+-----------+-------------+------------- > > > > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > > > > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > > > > (2 rows) > > > > > > > > - create index, overload t1's parallel_workers, using the 1M line just > > > > vacuumed: > > > > > > > > =# alter table t1 set (parallel_workers = 2); > > > > ALTER TABLE > > > > > > > > =# create index t1_parallel_2 on t1(id); > > > > CREATE INDEX > > > > > > > > =# alter table t1 set (parallel_workers = 0); > > > > ALTER TABLE > > > > > > > > =# create index t1_parallel_0 on t1(id); > > > > CREATE INDEX > > > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from > > > > pg_stat_statements where query ilike '%create index%'; > > > > query | calls | wal_bytes | wal_records > > > > | wal_num_fpw > > > > --------------------------------------+-------+-----------+-------------+------------- > > > > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 > > > > | 2745 > > > > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 > > > > | 2758 > > > > (2 rows) > > > > > > > > It all looks good to me. > > > > > > > > > > Here the wal_num_fpw and wal_bytes are different between parallel and > > > non-parallel versions. Is it due to checkpoint or something else? We > > > can probably rule out checkpoint by increasing checkpoint_timeout and > > > other checkpoint related parameters. > > > > I think this is because I did a checkpoint after the VACUUM tests, so the > > 1st > > CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I > > didn't try to investigate more since: > > > > We need to do this. > > > On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote: > > > > > > Also, I forgot to mention that let's not base this on buffer usage > > > patch for create index > > > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as > > > per recent discussion I am not sure about its usefulness. I think we > > > can proceed with this patch without > > > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well. > > > > > > Which is done in attached v11. > > > > Hmm, I haven't suggested removing the WAL usage from the parallel > create index. I just told not to use the infrastructure of another > patch. We bypass the buffer manager but do write WAL. See > _bt_blwritepage->log_newpage. So we need to accumulate WAL usage even > if we decide not to do anything about BufferUsage which means we need > to investigate the above inconsistency in wal_num_fpw and wal_bytes > between parallel and non-parallel version.
Oh, I thought that you wanted to wait on that part, as we'll probably change the parallel create index to report buffer access eventually. v12 attached with an adaptation of Sawada-san's original patch but only dealing with WAL activity. I did some more experiment, ensuring as much stability as possible: =# create table t1(id integer); CREATE TABLE =# insert into t1 select * from generate_series(1, 1000000); INSERT 0 1000000 =# select * from pg_stat_statements_reset() ; pg_stat_statements_reset -------------------------- (1 row) =# alter table t1 set (parallel_workers = 0); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 1); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_1 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 2); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_2 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 3); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_3 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 4); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_4 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 5); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_5 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 6); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_6 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 7); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_7 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 8); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_8 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 0); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0_bis ON t1(id); CREATE INDEX =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0_ter ON t1(id); CREATE INDEX =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; query | calls | wal_bytes | wal_records | wal_num_fpw ----------------------------------------------+-------+-----------+-------------+------------- create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 (11 rows) =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; relname | pg_relation_size -----------------------+------------------ t1_idx_parallel_0 | 22487040 t1_idx_parallel_0_bis | 22487040 t1_idx_parallel_0_ter | 22487040 t1_idx_parallel_2 | 22487040 t1_idx_parallel_1 | 22487040 t1_idx_parallel_4 | 22487040 t1_idx_parallel_3 | 22487040 t1_idx_parallel_5 | 22487040 t1_idx_parallel_6 | 22487040 t1_idx_parallel_7 | 22487040 t1_idx_parallel_8 | 22487040 (9 rows) So while the number of WAL records and full page images stay constant, we can see some small fluctuations in the total amount of generated WAL data, even for multiple execution of the sequential create index. I'm wondering if the fluctuations are due to some other internal details or if the WalUsage support is just completely broken (although I don't see any obvious issue ATM).
>From 694fe49a9973679ecda4a76b274ed135b753887e Mon Sep 17 00:00:00 2001 From: Kirill Bychik <kirill.byc...@gmail.com> Date: Tue, 17 Mar 2020 14:41:50 +0100 Subject: [PATCH v12 1/4] Add infrastructure to track WAL usage. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Fuji Masao Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4zlxws_cza3...@mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 37 ++++++++++++----- src/backend/access/nbtree/nbtsort.c | 40 +++++++++++++++++++ src/backend/access/transam/xlog.c | 12 +++++- src/backend/access/transam/xloginsert.c | 13 ++++-- src/backend/executor/execParallel.c | 38 +++++++++++++----- src/backend/executor/instrument.c | 53 ++++++++++++++++++++++--- src/include/access/xlog.h | 3 +- src/include/executor/execParallel.h | 1 + src/include/executor/instrument.h | 18 ++++++++- src/tools/pgindent/typedefs.list | 1 + 10 files changed, 183 insertions(+), 33 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 9f9596c718..cc7e8521a5 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -139,6 +139,7 @@ #define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2 #define PARALLEL_VACUUM_KEY_QUERY_TEXT 3 #define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4 +#define PARALLEL_VACUUM_KEY_WAL_USAGE 5 /* * Macro to check if we are in a parallel vacuum. If true, we are in the @@ -275,6 +276,9 @@ typedef struct LVParallelState /* Points to buffer usage area in DSM */ BufferUsage *buffer_usage; + /* Points to WAL usage area in DSM */ + WalUsage *wal_usage; + /* * The number of indexes that support parallel index bulk-deletion and * parallel index cleanup respectively. @@ -2143,8 +2147,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, vacrelstats->dead_tuples, nindexes, vacrelstats); /* - * Next, accumulate buffer usage. (This must wait for the workers to - * finish, or we might get incomplete data.) + * Next, accumulate buffer and WAL usage. (This must wait for the workers + * to finish, or we might get incomplete data.) */ if (nworkers > 0) { @@ -2154,7 +2158,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, WaitForParallelWorkersToFinish(lps->pcxt); for (i = 0; i < lps->pcxt->nworkers_launched; i++) - InstrAccumParallelQuery(&lps->buffer_usage[i]); + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); } /* @@ -3171,6 +3175,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, LVShared *shared; LVDeadTuples *dead_tuples; BufferUsage *buffer_usage; + WalUsage *wal_usage; bool *can_parallel_vacuum; long maxtuples; char *sharedquery; @@ -3255,15 +3260,19 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, shm_toc_estimate_keys(&pcxt->estimator, 1); /* - * Estimate space for BufferUsage -- PARALLEL_VACUUM_KEY_BUFFER_USAGE. + * Estimate space for BufferUsage and WalUsage -- + * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE. * * If there are no extensions loaded that care, we could skip this. We - * have no way of knowing whether anyone's looking at pgBufferUsage, so do - * it unconditionally. + * have no way of knowing whether anyone's looking at pgBufferUsage or + * pgWalUsage, so do it unconditionally. */ shm_toc_estimate_chunk(&pcxt->estimator, mul_size(sizeof(BufferUsage), pcxt->nworkers)); shm_toc_estimate_keys(&pcxt->estimator, 1); + shm_toc_estimate_chunk(&pcxt->estimator, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_estimate_keys(&pcxt->estimator, 1); /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */ querylen = strlen(debug_query_string); @@ -3299,11 +3308,18 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats, shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples); vacrelstats->dead_tuples = dead_tuples; - /* Allocate space for each worker's BufferUsage; no need to initialize */ + /* + * Allocate space for each worker's BufferUsage and WalUsage; no need to + * initialize + */ buffer_usage = shm_toc_allocate(pcxt->toc, mul_size(sizeof(BufferUsage), pcxt->nworkers)); shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage); lps->buffer_usage = buffer_usage; + wal_usage = shm_toc_allocate(pcxt->toc, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage); + lps->wal_usage = wal_usage; /* Store query string for workers */ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1); @@ -3435,6 +3451,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) LVShared *lvshared; LVDeadTuples *dead_tuples; BufferUsage *buffer_usage; + WalUsage *wal_usage; int nindexes; char *sharedquery; IndexBulkDeleteResult **stats; @@ -3511,9 +3528,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) parallel_vacuum_index(indrels, stats, lvshared, dead_tuples, nindexes, &vacrelstats); - /* Report buffer usage during parallel execution */ + /* Report buffer/WAL usage during parallel execution */ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false); - InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]); + wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false); + InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], + &wal_usage[ParallelWorkerNumber]); /* Pop the error context stack */ error_context_stack = errcallback.previous; diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index 3924945664..3f4cb7d39e 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -67,6 +67,7 @@ #include "access/xloginsert.h" #include "catalog/index.h" #include "commands/progress.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/smgr.h" @@ -81,6 +82,7 @@ #define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA000000000000002) #define PARALLEL_KEY_TUPLESORT_SPOOL2 UINT64CONST(0xA000000000000003) #define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xA000000000000004) +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xA000000000000005) /* * DISABLE_LEADER_PARTICIPATION disables the leader's participation in @@ -203,6 +205,7 @@ typedef struct BTLeader Sharedsort *sharedsort; Sharedsort *sharedsort2; Snapshot snapshot; + WalUsage *walusage; } BTLeader; /* @@ -1476,6 +1479,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) Sharedsort *sharedsort2; BTSpool *btspool = buildstate->spool; BTLeader *btleader = (BTLeader *) palloc0(sizeof(BTLeader)); + WalUsage *walusage; bool leaderparticipates = true; char *sharedquery; int querylen; @@ -1528,6 +1532,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) shm_toc_estimate_keys(&pcxt->estimator, 3); } + /* + * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE + * + * WalUsage during execution of maintenance command can be used by an + * extension that reports the WAL usage, such as pg_stat_statements. + * We have no way of knowing whether anyone's looking at pgWalUsage, + * so do it unconditionally. + */ + shm_toc_estimate_chunk(&pcxt->estimator, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_estimate_keys(&pcxt->estimator, 1); + /* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */ querylen = strlen(debug_query_string); shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); @@ -1599,6 +1615,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) memcpy(sharedquery, debug_query_string, querylen + 1); shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery); + /* Allocate space for each worker's WalUsage; no need to initialize */ + walusage = shm_toc_allocate(pcxt->toc, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage); + /* Launch workers, saving status for leader/caller */ LaunchParallelWorkers(pcxt); btleader->pcxt = pcxt; @@ -1609,6 +1630,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) btleader->sharedsort = sharedsort; btleader->sharedsort2 = sharedsort2; btleader->snapshot = snapshot; + btleader->walusage = walusage; /* If no workers were successfully launched, back out (do serial build) */ if (pcxt->nworkers_launched == 0) @@ -1637,8 +1659,18 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request) static void _bt_end_parallel(BTLeader *btleader) { + int i; + /* Shutdown worker processes */ WaitForParallelWorkersToFinish(btleader->pcxt); + + /* + * Next, accumulate WAL usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < btleader->pcxt->nworkers_launched; i++) + InstrAccumParallelQuery(NULL, &btleader->walusage[i]); + /* Free last reference to MVCC snapshot, if one was used */ if (IsMVCCSnapshot(btleader->snapshot)) UnregisterSnapshot(btleader->snapshot); @@ -1769,6 +1801,7 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc) Relation indexRel; LOCKMODE heapLockmode; LOCKMODE indexLockmode; + WalUsage *walusage; int sortmem; #ifdef BTREE_BUILD_STATS @@ -1830,11 +1863,18 @@ _bt_parallel_build_main(dsm_segment *seg, shm_toc *toc) tuplesort_attach_shared(sharedsort2, seg); } + /* Prepare to track buffer usage during parallel execution */ + InstrStartParallelQuery(); + /* Perform sorting of spool, and possibly a spool2 */ sortmem = maintenance_work_mem / btshared->scantuplesortstates; _bt_parallel_scan_and_sort(btspool, btspool2, btshared, sharedsort, sharedsort2, sortmem, false); + /* Report WAL usage during parallel execution */ + walusage = shm_toc_lookup(toc, PARALLEL_KEY_WAL_USAGE, false); + InstrEndParallelQuery(NULL, &walusage[ParallelWorkerNumber]); + #ifdef BTREE_BUILD_STATS if (log_btree_build_stats) { diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 977d448f50..50b78f3143 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -43,6 +43,7 @@ #include "commands/progress.h" #include "commands/tablespace.h" #include "common/controldata_utils.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "pgstat.h" @@ -996,7 +997,8 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt); XLogRecPtr XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn, - uint8 flags) + uint8 flags, + int num_fpw) { XLogCtlInsert *Insert = &XLogCtl->Insert; pg_crc32c rdata_crc; @@ -1252,6 +1254,14 @@ XLogInsertRecord(XLogRecData *rdata, ProcLastRecPtr = StartPos; XactLastRecEnd = EndPos; + /* Report WAL traffic to the instrumentation. */ + if (inserted) + { + pgWalUsage.wal_bytes += rechdr->xl_tot_len; + pgWalUsage.wal_records++; + pgWalUsage.wal_num_fpw += num_fpw; + } + return EndPos; } diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c index a618dec776..5e032e7042 100644 --- a/src/backend/access/transam/xloginsert.c +++ b/src/backend/access/transam/xloginsert.c @@ -25,6 +25,7 @@ #include "access/xloginsert.h" #include "catalog/pg_control.h" #include "common/pg_lzcompress.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "replication/origin.h" @@ -108,7 +109,7 @@ static MemoryContext xloginsert_cxt; static XLogRecData *XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecPtr RedoRecPtr, bool doPageWrites, - XLogRecPtr *fpw_lsn); + XLogRecPtr *fpw_lsn, int *num_fpw); static bool XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length, char *dest, uint16 *dlen); @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info) bool doPageWrites; XLogRecPtr fpw_lsn; XLogRecData *rdt; + int num_fpw = 0; /* * Get values needed to decide whether to do full-page writes. Since @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info) GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, - &fpw_lsn); + &fpw_lsn, &num_fpw); - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); } while (EndPos == InvalidXLogRecPtr); XLogResetInsertion(); @@ -482,7 +484,7 @@ XLogInsert(RmgrId rmid, uint8 info) static XLogRecData * XLogRecordAssemble(RmgrId rmid, uint8 info, XLogRecPtr RedoRecPtr, bool doPageWrites, - XLogRecPtr *fpw_lsn) + XLogRecPtr *fpw_lsn, int *num_fpw) { XLogRecData *rdt; uint32 total_len = 0; @@ -635,6 +637,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info, */ bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE; + /* Report a full page image constructed for the WAL record */ + *num_fpw += 1; + /* * Construct XLogRecData entries for the page content. */ diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index a753d6efa0..7d9ca66fc8 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -12,7 +12,7 @@ * workers and ensuring that their state generally matches that of the * leader; see src/backend/access/transam/README.parallel for details. * However, we must save and restore relevant executor state, such as - * any ParamListInfo associated with the query, buffer usage info, and + * any ParamListInfo associated with the query, buffer/WAL usage info, and * the actual plan to be passed down to the worker. * * IDENTIFICATION @@ -62,6 +62,7 @@ #define PARALLEL_KEY_DSA UINT64CONST(0xE000000000000007) #define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008) #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009) +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A) #define PARALLEL_TUPLE_QUEUE_SIZE 65536 @@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, char *pstmt_space; char *paramlistinfo_space; BufferUsage *bufusage_space; + WalUsage *walusage_space; SharedExecutorInstrumentation *instrumentation = NULL; SharedJitInstrumentation *jit_instrumentation = NULL; int pstmt_len; @@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, mul_size(sizeof(BufferUsage), pcxt->nworkers)); shm_toc_estimate_keys(&pcxt->estimator, 1); + /* + * Same thing for WalUsage. + */ + shm_toc_estimate_chunk(&pcxt->estimator, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_estimate_keys(&pcxt->estimator, 1); + /* Estimate space for tuple queues. */ shm_toc_estimate_chunk(&pcxt->estimator, mul_size(PARALLEL_TUPLE_QUEUE_SIZE, pcxt->nworkers)); @@ -728,6 +737,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate, shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufusage_space); pei->buffer_usage = bufusage_space; + /* Same for WalUsage. */ + walusage_space = shm_toc_allocate(pcxt->toc, + mul_size(sizeof(WalUsage), pcxt->nworkers)); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, walusage_space); + pei->wal_usage = walusage_space; + /* Set up the tuple queues that the workers will write into. */ pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false); @@ -1069,7 +1084,7 @@ ExecParallelRetrieveJitInstrumentation(PlanState *planstate, /* * Finish parallel execution. We wait for parallel workers to finish, and - * accumulate their buffer usage. + * accumulate their buffer/WAL usage. */ void ExecParallelFinish(ParallelExecutorInfo *pei) @@ -1109,11 +1124,11 @@ ExecParallelFinish(ParallelExecutorInfo *pei) WaitForParallelWorkersToFinish(pei->pcxt); /* - * Next, accumulate buffer usage. (This must wait for the workers to - * finish, or we might get incomplete data.) + * Next, accumulate buffer/WAL usage. (This must wait for the workers + * to finish, or we might get incomplete data.) */ for (i = 0; i < nworkers; i++) - InstrAccumParallelQuery(&pei->buffer_usage[i]); + InstrAccumParallelQuery(&pei->buffer_usage[i], &pei->wal_usage[i]); pei->finished = true; } @@ -1333,6 +1348,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) { FixedParallelExecutorState *fpes; BufferUsage *buffer_usage; + WalUsage *wal_usage; DestReceiver *receiver; QueryDesc *queryDesc; SharedExecutorInstrumentation *instrumentation; @@ -1386,11 +1402,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) ExecSetTupleBound(fpes->tuples_needed, queryDesc->planstate); /* - * Prepare to track buffer usage during query execution. + * Prepare to track buffer/WAL usage during query execution. * * We do this after starting up the executor to match what happens in the - * leader, which also doesn't count buffer accesses that occur during - * executor startup. + * leader, which also doesn't count buffer accesses and WAL activity that + * occur during executor startup. */ InstrStartParallelQuery(); @@ -1406,9 +1422,11 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) /* Shut down the executor */ ExecutorFinish(queryDesc); - /* Report buffer usage during parallel execution. */ + /* Report buffer/WAL usage during parallel execution. */ buffer_usage = shm_toc_lookup(toc, PARALLEL_KEY_BUFFER_USAGE, false); - InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber]); + wal_usage = shm_toc_lookup (toc, PARALLEL_KEY_WAL_USAGE, false); + InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber], + &wal_usage[ParallelWorkerNumber]); /* Report instrumentation data if any instrumentation options are set. */ if (instrumentation != NULL) diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index 042e10f96b..a77571a895 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -19,8 +19,11 @@ BufferUsage pgBufferUsage; static BufferUsage save_pgBufferUsage; +WalUsage pgWalUsage; +static WalUsage save_pgWalUsage; static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); +static void WalUsageAdd(WalUsage *dst, WalUsage *add); /* Allocate new instrumentation structure(s) */ @@ -31,15 +34,17 @@ InstrAlloc(int n, int instrument_options) /* initialize all fields to zeroes, then modify as needed */ instr = palloc0(n * sizeof(Instrumentation)); - if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER)) + if (instrument_options & (INSTRUMENT_BUFFERS | INSTRUMENT_TIMER | INSTRUMENT_WAL)) { bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0; + bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; bool need_timer = (instrument_options & INSTRUMENT_TIMER) != 0; int i; for (i = 0; i < n; i++) { instr[i].need_bufusage = need_buffers; + instr[i].need_walusage = need_wal; instr[i].need_timer = need_timer; } } @@ -53,6 +58,7 @@ InstrInit(Instrumentation *instr, int instrument_options) { memset(instr, 0, sizeof(Instrumentation)); instr->need_bufusage = (instrument_options & INSTRUMENT_BUFFERS) != 0; + instr->need_walusage = (instrument_options & INSTRUMENT_WAL) != 0; instr->need_timer = (instrument_options & INSTRUMENT_TIMER) != 0; } @@ -67,6 +73,9 @@ InstrStartNode(Instrumentation *instr) /* save buffer usage totals at node entry, if needed */ if (instr->need_bufusage) instr->bufusage_start = pgBufferUsage; + + if (instr->need_walusage) + instr->walusage_start = pgWalUsage; } /* Exit from a plan node */ @@ -95,6 +104,10 @@ InstrStopNode(Instrumentation *instr, double nTuples) BufferUsageAccumDiff(&instr->bufusage, &pgBufferUsage, &instr->bufusage_start); + if (instr->need_walusage) + WalUsageAccumDiff(&instr->walusage, + &pgWalUsage, &instr->walusage_start); + /* Is this the first tuple of this cycle? */ if (!instr->running) { @@ -158,6 +171,9 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add) /* Add delta of buffer usage since entry to node's totals */ if (dst->need_bufusage) BufferUsageAdd(&dst->bufusage, &add->bufusage); + + if (dst->need_walusage) + WalUsageAdd(&dst->walusage, &add->walusage); } /* note current values during parallel executor startup */ @@ -165,21 +181,29 @@ void InstrStartParallelQuery(void) { save_pgBufferUsage = pgBufferUsage; + save_pgWalUsage = pgWalUsage; } /* report usage after parallel executor shutdown */ void -InstrEndParallelQuery(BufferUsage *result) +InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage) { - memset(result, 0, sizeof(BufferUsage)); - BufferUsageAccumDiff(result, &pgBufferUsage, &save_pgBufferUsage); + if (bufusage) + { + memset(bufusage, 0, sizeof(BufferUsage)); + BufferUsageAccumDiff(bufusage, &pgBufferUsage, &save_pgBufferUsage); + } + memset(walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(walusage, &pgWalUsage, &save_pgWalUsage); } /* accumulate work done by workers in leader's stats */ void -InstrAccumParallelQuery(BufferUsage *result) +InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage) { - BufferUsageAdd(&pgBufferUsage, result); + if (bufusage) + BufferUsageAdd(&pgBufferUsage, bufusage); + WalUsageAdd(&pgWalUsage, walusage); } /* dst += add */ @@ -221,3 +245,20 @@ BufferUsageAccumDiff(BufferUsage *dst, INSTR_TIME_ACCUM_DIFF(dst->blk_write_time, add->blk_write_time, sub->blk_write_time); } + +/* helper functions for WAL usage accumulation */ +static void +WalUsageAdd(WalUsage *dst, WalUsage *add) +{ + dst->wal_bytes += add->wal_bytes; + dst->wal_records += add->wal_records; + dst->wal_num_fpw += add->wal_num_fpw; +} + +void +WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub) +{ + dst->wal_bytes += add->wal_bytes - sub->wal_bytes; + dst->wal_records += add->wal_records - sub->wal_records; + dst->wal_num_fpw += add->wal_num_fpw - sub->wal_num_fpw; +} diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 9ec7b31cce..b91e724b2d 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -259,7 +259,8 @@ struct XLogRecData; extern XLogRecPtr XLogInsertRecord(struct XLogRecData *rdata, XLogRecPtr fpw_lsn, - uint8 flags); + uint8 flags, + int num_fpw); extern void XLogFlush(XLogRecPtr RecPtr); extern bool XLogBackgroundFlush(void); extern bool XLogNeedsFlush(XLogRecPtr RecPtr); diff --git a/src/include/executor/execParallel.h b/src/include/executor/execParallel.h index 17d07cf020..1cc5b524fd 100644 --- a/src/include/executor/execParallel.h +++ b/src/include/executor/execParallel.h @@ -26,6 +26,7 @@ typedef struct ParallelExecutorInfo PlanState *planstate; /* plan subtree we're running in parallel */ ParallelContext *pcxt; /* parallel context we're using */ BufferUsage *buffer_usage; /* points to bufusage area in DSM */ + WalUsage *wal_usage; /* walusage area in DSM */ SharedExecutorInstrumentation *instrumentation; /* optional */ struct SharedJitInstrumentation *jit_instrumentation; /* optional */ dsa_area *area; /* points to DSA area in DSM */ diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h index 3825a5ac1f..e8875a8e9b 100644 --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -32,12 +32,20 @@ typedef struct BufferUsage instr_time blk_write_time; /* time spent writing */ } BufferUsage; +typedef struct WalUsage +{ + long wal_records; /* # of WAL records produced */ + long wal_num_fpw; /* # of WAL full page image produced */ + uint64 wal_bytes; /* size of WAL records produced */ +} WalUsage; + /* Flag bits included in InstrAlloc's instrument_options bitmask */ typedef enum InstrumentOption { INSTRUMENT_TIMER = 1 << 0, /* needs timer (and row counts) */ INSTRUMENT_BUFFERS = 1 << 1, /* needs buffer usage */ INSTRUMENT_ROWS = 1 << 2, /* needs row count */ + INSTRUMENT_WAL = 1 << 3, /* needs WAL usage */ INSTRUMENT_ALL = PG_INT32_MAX } InstrumentOption; @@ -46,6 +54,7 @@ typedef struct Instrumentation /* Parameters set at node creation: */ bool need_timer; /* true if we need timer data */ bool need_bufusage; /* true if we need buffer usage data */ + bool need_walusage; /* true if we need WAL usage data */ /* Info about current plan cycle: */ bool running; /* true if we've completed first tuple */ instr_time starttime; /* start time of current iteration of node */ @@ -53,6 +62,7 @@ typedef struct Instrumentation double firsttuple; /* time for first tuple of this cycle */ double tuplecount; /* # of tuples emitted so far this cycle */ BufferUsage bufusage_start; /* buffer usage at start */ + WalUsage walusage_start; /* WAL usage at start */ /* Accumulated statistics across all completed cycles: */ double startup; /* total startup time (in seconds) */ double total; /* total time (in seconds) */ @@ -62,6 +72,7 @@ typedef struct Instrumentation double nfiltered1; /* # of tuples removed by scanqual or joinqual */ double nfiltered2; /* # of tuples removed by "other" quals */ BufferUsage bufusage; /* total buffer usage */ + WalUsage walusage; /* total WAL usage */ } Instrumentation; typedef struct WorkerInstrumentation @@ -71,6 +82,7 @@ typedef struct WorkerInstrumentation } WorkerInstrumentation; extern PGDLLIMPORT BufferUsage pgBufferUsage; +extern PGDLLIMPORT WalUsage pgWalUsage; extern Instrumentation *InstrAlloc(int n, int instrument_options); extern void InstrInit(Instrumentation *instr, int instrument_options); @@ -79,9 +91,11 @@ extern void InstrStopNode(Instrumentation *instr, double nTuples); extern void InstrEndLoop(Instrumentation *instr); extern void InstrAggNode(Instrumentation *dst, Instrumentation *add); extern void InstrStartParallelQuery(void); -extern void InstrEndParallelQuery(BufferUsage *result); -extern void InstrAccumParallelQuery(BufferUsage *result); +extern void InstrEndParallelQuery(BufferUsage *bufusage, WalUsage *walusage); +extern void InstrAccumParallelQuery(BufferUsage *bufusage, WalUsage *walusage); extern void BufferUsageAccumDiff(BufferUsage *dst, const BufferUsage *add, const BufferUsage *sub); +extern void WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, + const WalUsage *sub); #endif /* INSTRUMENT_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 939de985d3..34623523a7 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2643,6 +2643,7 @@ WalSndCtlData WalSndSendDataCallback WalSndState WalTimeSample +WalUsage WalWriteMethod Walfile WindowAgg -- 2.20.1
>From 07cdbb8dd2118ee9d4cfce29a3b596c9def476e8 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud <julien.rouh...@free.fr> Date: Sun, 29 Mar 2020 12:38:14 +0200 Subject: [PATCH v12 2/4] Add option to report WAL usage in EXPLAIN and auto_explain. Author: Julien Rouhaud Reviewed-by: Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4zlxws_cza3...@mail.gmail.com --- contrib/auto_explain/auto_explain.c | 15 ++++++ doc/src/sgml/auto-explain.sgml | 20 ++++++++ doc/src/sgml/ref/explain.sgml | 14 ++++++ src/backend/commands/explain.c | 74 +++++++++++++++++++++++++++-- src/bin/psql/tab-complete.c | 4 +- src/include/commands/explain.h | 3 ++ 6 files changed, 124 insertions(+), 6 deletions(-) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index f69dde876c..56c549d84c 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -27,6 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */ static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; +static bool auto_explain_log_wal = false; static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = true; static bool auto_explain_log_settings = false; @@ -148,6 +149,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auto_explain.log_wal", + "Log WAL usage.", + NULL, + &auto_explain_log_wal, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + DefineCustomBoolVariable("auto_explain.log_triggers", "Include trigger statistics in plans.", "This has no effect unless log_analyze is also set.", @@ -280,6 +292,8 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags) queryDesc->instrument_options |= INSTRUMENT_ROWS; if (auto_explain_log_buffers) queryDesc->instrument_options |= INSTRUMENT_BUFFERS; + if (auto_explain_log_wal) + queryDesc->instrument_options |= INSTRUMENT_WAL; } } @@ -374,6 +388,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze); es->verbose = auto_explain_log_verbose; es->buffers = (es->analyze && auto_explain_log_buffers); + es->wal = (es->analyze && auto_explain_log_wal); es->timing = (es->analyze && auto_explain_log_timing); es->summary = es->analyze; es->format = auto_explain_log_format; diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml index 3d619d4a3d..d4d29c4a64 100644 --- a/doc/src/sgml/auto-explain.sgml +++ b/doc/src/sgml/auto-explain.sgml @@ -109,6 +109,26 @@ LOAD 'auto_explain'; </listitem> </varlistentry> + <varlistentry> + <term> + <varname>auto_explain.log_wal</varname> (<type>boolean</type>) + <indexterm> + <primary><varname>auto_explain.log_wal</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + <varname>auto_explain.log_wal</varname> controls whether WAL + usage statistics are printed when an execution plan is logged; it's + equivalent to the <literal>WAL</literal> option of <command>EXPLAIN</command>. + This parameter has no effect + unless <varname>auto_explain.log_analyze</varname> is enabled. + This parameter is off by default. + Only superusers can change this setting. + </para> + </listitem> + </varlistentry> + <varlistentry> <term> <varname>auto_explain.log_timing</varname> (<type>boolean</type>) diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index 385d10411f..494e60ecc9 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -41,6 +41,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] <replaceable class="parameter">statement</replac COSTS [ <replaceable class="parameter">boolean</replaceable> ] SETTINGS [ <replaceable class="parameter">boolean</replaceable> ] BUFFERS [ <replaceable class="parameter">boolean</replaceable> ] + WAL [ <replaceable class="parameter">boolean</replaceable> ] TIMING [ <replaceable class="parameter">boolean</replaceable> ] SUMMARY [ <replaceable class="parameter">boolean</replaceable> ] FORMAT { TEXT | XML | JSON | YAML } @@ -192,6 +193,19 @@ ROLLBACK; </listitem> </varlistentry> + <varlistentry> + <term><literal>WAL</literal></term> + <listitem> + <para> + Include information on WAL record generation. Specifically, include the + number of records, full page images and bytes generated. In text + format, only non-zero values are printed. This parameter may only be + used when <literal>ANALYZE</literal> is also enabled. It defaults to + <literal>FALSE</literal>. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><literal>TIMING</literal></term> <listitem> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index ee0e638f33..cefe2144e5 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -113,6 +113,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es); static void show_eval_params(Bitmapset *bms_params, ExplainState *es); static const char *explain_get_index_name(Oid indexId); static void show_buffer_usage(ExplainState *es, const BufferUsage *usage); +static void show_wal_usage(ExplainState *es, const WalUsage *usage); static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir, ExplainState *es); static void ExplainScanTarget(Scan *plan, ExplainState *es); @@ -175,6 +176,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, es->costs = defGetBoolean(opt); else if (strcmp(opt->defname, "buffers") == 0) es->buffers = defGetBoolean(opt); + else if (strcmp(opt->defname, "wal") == 0) + es->wal = defGetBoolean(opt); else if (strcmp(opt->defname, "settings") == 0) es->settings = defGetBoolean(opt); else if (strcmp(opt->defname, "timing") == 0) @@ -219,6 +222,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("EXPLAIN option BUFFERS requires ANALYZE"))); + if (es->wal && !es->analyze) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("EXPLAIN option WAL requires ANALYZE"))); + /* if the timing was not set explicitly, set default value */ es->timing = (timing_set) ? es->timing : es->analyze; @@ -494,6 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, if (es->buffers) instrument_option |= INSTRUMENT_BUFFERS; + if (es->wal) + instrument_option |= INSTRUMENT_WAL; /* * We always collect timing for the entire statement, even when node-level @@ -1942,12 +1952,14 @@ ExplainNode(PlanState *planstate, List *ancestors, } } - /* Show buffer usage */ + /* Show buffer/WAL usage */ if (es->buffers && planstate->instrument) show_buffer_usage(es, &planstate->instrument->bufusage); + if (es->wal && planstate->instrument) + show_wal_usage(es, &planstate->instrument->walusage); - /* Prepare per-worker buffer usage */ - if (es->workers_state && es->buffers && es->verbose) + /* Prepare per-worker buffer/WAL usage */ + if (es->workers_state && (es->buffers || es->wal) && es->verbose) { WorkerInstrumentation *w = planstate->worker_instrument; @@ -1960,7 +1972,10 @@ ExplainNode(PlanState *planstate, List *ancestors, continue; ExplainOpenWorker(n, es); - show_buffer_usage(es, &instrument->bufusage); + if (es->buffers) + show_buffer_usage(es, &instrument->bufusage); + if (es->wal) + show_wal_usage(es, &instrument->walusage); ExplainCloseWorker(n, es); } } @@ -3059,6 +3074,44 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage) } } +/* + * Show WAL usage details. + */ +static void +show_wal_usage(ExplainState *es, const WalUsage *usage) +{ + if (es->format == EXPLAIN_FORMAT_TEXT) + { + /* Show only positive counter values. */ + if ((usage->wal_records > 0) || (usage->wal_num_fpw > 0) || + (usage->wal_bytes > 0)) + { + ExplainIndentText(es); + appendStringInfoString(es->str, "WAL:"); + + if (usage->wal_records > 0) + appendStringInfo(es->str, " records=%ld", + usage->wal_records); + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page writes=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes); + appendStringInfoChar(es->str, '\n'); + } + } + else + { + ExplainPropertyInteger("WAL records", NULL, + usage->wal_records, es); + ExplainPropertyInteger("WAL full page writes", NULL, + usage->wal_num_fpw, es); + ExplainPropertyUInteger("WAL bytes", NULL, + usage->wal_bytes, es); + } +} + /* * Add some additional details about an IndexScan or IndexOnlyScan */ @@ -3843,6 +3896,19 @@ ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value, ExplainProperty(qlabel, unit, buf, true, es); } +/* + * Explain an unsigned integer-valued property. + */ +void +ExplainPropertyUInteger(const char *qlabel, const char *unit, uint64 value, + ExplainState *es) +{ + char buf[32]; + + snprintf(buf, sizeof(buf), UINT64_FORMAT, value); + ExplainProperty(qlabel, unit, buf, true, es); +} + /* * Explain a float-valued property, using the specified number of * fractional digits. diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5fec59723c..0e7a373caf 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3045,8 +3045,8 @@ psql_completion(const char *text, int start, int end) */ if (ends_with(prev_wd, '(') || ends_with(prev_wd, ',')) COMPLETE_WITH("ANALYZE", "VERBOSE", "COSTS", "SETTINGS", - "BUFFERS", "TIMING", "SUMMARY", "FORMAT"); - else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|TIMING|SUMMARY")) + "BUFFERS", "WAL", "TIMING", "SUMMARY", "FORMAT"); + else if (TailMatches("ANALYZE|VERBOSE|COSTS|SETTINGS|BUFFERS|WAL|TIMING|SUMMARY")) COMPLETE_WITH("ON", "OFF"); else if (TailMatches("FORMAT")) COMPLETE_WITH("TEXT", "XML", "JSON", "YAML"); diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index 54f6240e5e..7b0b0a94a6 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -42,6 +42,7 @@ typedef struct ExplainState bool analyze; /* print actual times */ bool costs; /* print estimated costs */ bool buffers; /* print buffer usage */ + bool wal; /* print WAL usage */ bool timing; /* print detailed node timing */ bool summary; /* print total planning and execution timing */ bool settings; /* print modified settings */ @@ -110,6 +111,8 @@ extern void ExplainPropertyText(const char *qlabel, const char *value, ExplainState *es); extern void ExplainPropertyInteger(const char *qlabel, const char *unit, int64 value, ExplainState *es); +extern void ExplainPropertyUInteger(const char *qlabel, const char *unit, + uint64 value, ExplainState *es); extern void ExplainPropertyFloat(const char *qlabel, const char *unit, double value, int ndigits, ExplainState *es); extern void ExplainPropertyBool(const char *qlabel, bool value, -- 2.20.1
>From ab9698684cc31df460b0e6993636fa4359d78402 Mon Sep 17 00:00:00 2001 From: Kirill Bychik <kirill.byc...@gmail.com> Date: Tue, 17 Mar 2020 14:42:02 +0100 Subject: [PATCH v12 3/4] Keep track of WAL usage in pg_stat_statements. Author: Kirill Bychik Reviewed-by: Julien Rouhaud, Fuji Masao Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4zlxws_cza3...@mail.gmail.com --- .../expected/pg_stat_statements.out | 39 +++++++++++++ .../pg_stat_statements--1.7--1.8.sql | 5 +- .../pg_stat_statements/pg_stat_statements.c | 55 +++++++++++++++++-- .../sql/pg_stat_statements.sql | 23 ++++++++ doc/src/sgml/pgstatstatements.sgml | 27 +++++++++ 5 files changed, 144 insertions(+), 5 deletions(-) diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 45dbe9e677..02da7245b4 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -211,6 +211,45 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; UPDATE test SET b = $1 WHERE a > $2 | 1 | 3 (10 rows) +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +-- +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-------------------------- + +(1 row) + +-- utility "create table" should not be shown +CREATE TABLE pgss_test (a int, b char(20)); +INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa'); +UPDATE pgss_test SET b = 'bbb' WHERE a > 7; +DELETE FROM pgss_test WHERE a > 9; +-- DROP test table +SET pg_stat_statements.track_utility = TRUE; +DROP TABLE pgss_test; +SET pg_stat_statements.track_utility = FALSE; +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows +-----------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- + DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t + DROP TABLE pgss_test | 1 | 0 | t | t | f + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t + SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f + SELECT query, calls, rows, +| 0 | 0 | f | f | t + wal_bytes > $1 as wal_bytes_generated, +| | | | | + wal_records > $2 as wal_records_generated, +| | | | | + wal_records = rows as wal_records_as_rows +| | | | | + FROM pg_stat_statements ORDER BY query COLLATE "C" | | | | | + SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t + UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t +(7 rows) + -- -- pg_stat_statements.track = none -- diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql index 60d454db7f..2fcf7aee01 100644 --- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql +++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql @@ -41,7 +41,10 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean, OUT temp_blks_read int8, OUT temp_blks_written int8, OUT blk_read_time float8, - OUT blk_write_time float8 + OUT blk_write_time float8, + OUT wal_bytes numeric, + OUT wal_records int8, + OUT wal_num_fpw int8 ) RETURNS SETOF record AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8' diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 942922b01f..f8bf4f852a 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -185,6 +185,9 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ + uint64 wal_bytes; /* total amount of WAL bytes generated */ + int64 wal_records; /* # of WAL records generated */ + int64 wal_num_fpw; /* # of WAL full page image generated */ double blk_read_time; /* time spent reading, in msec */ double blk_write_time; /* time spent writing, in msec */ double usage; /* usage factor */ @@ -348,6 +351,7 @@ static void pgss_store(const char *query, uint64 queryId, pgssStoreKind kind, double total_time, uint64 rows, const BufferUsage *bufusage, + const WalUsage *walusage, pgssJumbleState *jstate); static void pg_stat_statements_internal(FunctionCallInfo fcinfo, pgssVersion api_version, @@ -891,6 +895,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query) 0, 0, NULL, + NULL, &jstate); } @@ -926,9 +931,16 @@ pgss_planner(Query *parse, instr_time duration; BufferUsage bufusage_start, bufusage; + WalUsage walusage_start, + walusage; /* We need to track buffer usage as the planner can access them. */ bufusage_start = pgBufferUsage; + /* + * Similarly the planner could write some WAL records in some cases + * (e.g. setting a hint bit with those being WAL-logged) + */ + walusage_start = pgWalUsage; INSTR_TIME_SET_CURRENT(start); plan_nested_level++; @@ -954,6 +966,10 @@ pgss_planner(Query *parse, memset(&bufusage, 0, sizeof(BufferUsage)); BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start); + /* calc differences of WAL counters. */ + memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + pgss_store(query_string, parse->queryId, parse->stmt_location, @@ -962,6 +978,7 @@ pgss_planner(Query *parse, INSTR_TIME_GET_MILLISEC(duration), 0, &bufusage, + &walusage, NULL); } else @@ -1079,6 +1096,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc) queryDesc->totaltime->total * 1000.0, /* convert to msec */ queryDesc->estate->es_processed, &queryDesc->totaltime->bufusage, + &queryDesc->totaltime->walusage, NULL); } @@ -1123,8 +1141,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, uint64 rows; BufferUsage bufusage_start, bufusage; - + WalUsage walusage_start, + walusage; bufusage_start = pgBufferUsage; + walusage_start = pgWalUsage; INSTR_TIME_SET_CURRENT(start); exec_nested_level++; @@ -1154,6 +1174,10 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, memset(&bufusage, 0, sizeof(BufferUsage)); BufferUsageAccumDiff(&bufusage, &pgBufferUsage, &bufusage_start); + /* calc differences of WAL counters. */ + memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + pgss_store(queryString, 0, /* signal that it's a utility stmt */ pstmt->stmt_location, @@ -1162,6 +1186,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, INSTR_TIME_GET_MILLISEC(duration), rows, &bufusage, + &walusage, NULL); } else @@ -1197,7 +1222,8 @@ pgss_hash_string(const char *str, int len) * * If jstate is not NULL then we're trying to create an entry for which * we have no statistics as yet; we just want to record the normalized - * query string. total_time, rows, bufusage are ignored in this case. + * query string. total_time, rows, bufusage and walusage are ignored in this + * case. * * If kind is PGSS_PLAN or PGSS_EXEC, its value is used as the array position * for the arrays in the Counters field. @@ -1208,6 +1234,7 @@ pgss_store(const char *query, uint64 queryId, pgssStoreKind kind, double total_time, uint64 rows, const BufferUsage *bufusage, + const WalUsage *walusage, pgssJumbleState *jstate) { pgssHashKey key; @@ -1402,6 +1429,9 @@ pgss_store(const char *query, uint64 queryId, e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time); e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time); e->counters.usage += USAGE_EXEC(total_time); + e->counters.wal_bytes += walusage->wal_bytes; + e->counters.wal_records += walusage->wal_records; + e->counters.wal_num_fpw += walusage->wal_num_fpw; SpinLockRelease(&e->mutex); } @@ -1449,8 +1479,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS) #define PG_STAT_STATEMENTS_COLS_V1_1 18 #define PG_STAT_STATEMENTS_COLS_V1_2 19 #define PG_STAT_STATEMENTS_COLS_V1_3 23 -#define PG_STAT_STATEMENTS_COLS_V1_8 29 -#define PG_STAT_STATEMENTS_COLS 29 /* maximum of above */ +#define PG_STAT_STATEMENTS_COLS_V1_8 32 +#define PG_STAT_STATEMENTS_COLS 32 /* maximum of above */ /* * Retrieve statement statistics. @@ -1786,6 +1816,23 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo, values[i++] = Float8GetDatumFast(tmp.blk_read_time); values[i++] = Float8GetDatumFast(tmp.blk_write_time); } + if (api_version >= PGSS_V1_8) + { + char buf[256]; + Datum wal_bytes; + + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); + + /* Convert to numeric. */ + wal_bytes = DirectFunctionCall3(numeric_in, + CStringGetDatum(buf), + ObjectIdGetDatum(0), + Int32GetDatum(-1)); + + values[i++] = wal_bytes; + values[i++] = Int64GetDatumFast(tmp.wal_records); + values[i++] = Int64GetDatumFast(tmp.wal_num_fpw); + } Assert(i == (api_version == PGSS_V1_0 ? PG_STAT_STATEMENTS_COLS_V1_0 : api_version == PGSS_V1_1 ? PG_STAT_STATEMENTS_COLS_V1_1 : diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql index 435d51008f..eaacd4021a 100644 --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql @@ -101,6 +101,29 @@ SELECT * FROM test WHERE a IN (1, 2, 3, 4, 5); SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +-- +SELECT pg_stat_statements_reset(); + +-- utility "create table" should not be shown +CREATE TABLE pgss_test (a int, b char(20)); + +INSERT INTO pgss_test VALUES(generate_series(1, 10), 'aaa'); +UPDATE pgss_test SET b = 'bbb' WHERE a > 7; +DELETE FROM pgss_test WHERE a > 9; +-- DROP test table +SET pg_stat_statements.track_utility = TRUE; +DROP TABLE pgss_test; +SET pg_stat_statements.track_utility = FALSE; + +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; + -- -- pg_stat_statements.track = none -- diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index b4df84c60b..3d26108649 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -264,6 +264,33 @@ </entry> </row> + <row> + <entry><structfield>wal_bytes</structfield></entry> + <entry><type>numeric</type></entry> + <entry></entry> + <entry> + Total amount of WAL bytes generated by the statement + </entry> + </row> + + <row> + <entry><structfield>wal_records</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Total count of WAL records generated by the statement + </entry> + </row> + + <row> + <entry><structfield>wal_num_fpw</structfield></entry> + <entry><type>bigint</type></entry> + <entry></entry> + <entry> + Total count of WAL full page writes generated by the statement + </entry> + </row> + </tbody> </tgroup> </table> -- 2.20.1
>From 5adcd8bf0af96363974e93a3f03c2e509cf92fcb Mon Sep 17 00:00:00 2001 From: Julien Rouhaud <julien.rouh...@free.fr> Date: Thu, 19 Mar 2020 16:08:47 +0100 Subject: [PATCH v12 4/4] Expose WAL usage counters in verbose (auto)vacuum output. Author: Julien Rouhaud Reviewed-by: Fuji Masao Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4zlxws_cza3...@mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index cc7e8521a5..735087dd74 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -410,6 +410,8 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, int nindexes; PGRUsage ru0; TimestampTz starttime = 0; + WalUsage walusage_start = pgWalUsage; + WalUsage walusage = {0, 0, 0}; long secs; int usecs; double read_rate, @@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, TimestampDifference(starttime, endtime, &secs, &usecs); + memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + read_rate = 0; write_rate = 0; if ((secs > 0) || (usecs > 0)) @@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, (long long) VacuumPageDirty); appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"), read_rate, write_rate); - appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, + _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"), + walusage.wal_records, + walusage.wal_num_fpw, + walusage.wal_bytes); ereport(LOG, (errmsg_internal("%s", buf.data))); @@ -758,6 +769,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, IndexBulkDeleteResult **indstats; int i; PGRUsage ru0; + WalUsage walusage_start = pgWalUsage; + WalUsage walusage = {0, 0, 0}; Buffer vmbuffer = InvalidBuffer; BlockNumber next_unskippable_block; bool skipping_blocks; @@ -1727,6 +1740,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, "%u pages are entirely empty.\n", empty_pages), empty_pages); + + memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + appendStringInfo(&buf, _("%ld WAL records, %ld WAL full page writes, " + UINT64_FORMAT " WAL bytes\n"), + walusage.wal_records, + walusage.wal_num_fpw, + walusage.wal_bytes); + appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0)); ereport(elevel, -- 2.20.1