Proposal: Progressive explain

Rafael Thofehrn Castro Sun, 29 Dec 2024 17:19:27 -0800

Hello community,

CONTEXT:


Back in October I presented the talk "Debugging active queries with
mid-flight instrumented explain plans" at PGConf EU 2024
(recording: https://www.youtube.com/watch?v=6ahTb-7C05c) presenting
an experimental feature that enables visualization of in progress
EXPLAIN ANALYZE executions. Given the positive feedback and requests,
I am sending this patch with the feature, which I am calling Progressive
Explain.

PROPOSAL:

This proposal introduces a feature to print execution plans of active
queries in an in-memory shared hash object so that other sessions can
visualize them with a new view: pg_stat_progress_explain.

Plans are only printed if the new GUC parameter progressive_explain is
enabled.

For regular queries or queries started with EXPLAIN (without ANALYZE)
the plan is printed only once at the start.

For instrumented runs (started via EXPLAIN ANALYZE or when auto_explain
flag log_analyze is enabled), the plan is printed on a fixed interval
controlled by the new GUC parameter progressive_explain_interval. This plan
includes all instrumentation stats computed so far (per node rows and
execution time).

New view:
- pg_stat_progress_explain
  - pid: PID of the process running the query
  - last_explain: timestamp when plan was last printed
  - explain_count: amount of times plan was printed
  - total_explain_time: accumulated time spend printing plans (in ms)
  - explain: the actual plan (limited read privileges)

New GUCs:
- progressive_explain: if progressive plans are printed for local
session.
  - type: bool
  - default: off
  - context: user
- progressive_explain_interval: interval between each explain print.
  - type: int
  - default: 1s
  - min: 10ms
  - context: user
- progressive_explain_sample_rate: fraction of rows processed by the
query until progressive_explain_interval is evaluated to print a
progressive plan
  - type: floating point
  - default: 0.1
  - range: (0.0 - 1.0)
  - context: user
- progressive_explain_output_size: max output size of the plan
printed in the in-memory hash table.
  - type: int
  - default: 4096
  - min: 100
  - context: postmaster
- progressive_explain_format: format used to print the plans.
  - type: enum
  - default: text
  - context: user
- progressive_explain_settings: controls whether information about
modified configuration is added to the printed plan.
  - type: bool
  - default: off
  - context: user
- progressive_explain_verbose: controls whether verbose details are
added to the printed plan.
  - type: bool
  - default: off
  - context: user

DEMONSTRATION:

postgres=# SET progressive_explain = ON;
SET
postgres=# EXPLAIN ANALYZE SELECT *
FROM test t1
UNION ALL
SELECT *
FROM test t1;

postgres=# select * from pg_stat_progress_explain;
-[ RECORD 1
]------+---------------------------------------------------------------------------------------------------------------------------------------
pid                | 299663
last_explain       | 2024-12-29 22:40:33.016833+00
explain_count      | 5
total_explain_time | 0.205
explain            | Append  (cost=0.00..466670.40 rows=20000160 width=37)
(actual time=0.052..3372.318 rows=14013813 loops=1)
                   |   Buffers: shared hit=4288 read=112501
                   |   ->  Seq Scan on test t1  (cost=0.00..183334.80
rows=10000080 width=37) (actual time=0.052..1177.428 rows=10000000 loops=1)
                   |         Buffers: shared hit=4288 read=79046
                   |   ->  Seq Scan on test t1_1  (cost=0.00..183334.80
rows=10000080 width=37) (actual time=0.072..608.481 rows=4013813 loops=1)
(current)
                   |         Buffers: shared read=33455
                   |

IMPLEMENTATION HIGHLIGHTS:

- The initial plan is printed at the end of standard_ExecutorStart
if progressive_explain is enabled, for both regular queries and
instrumented ones (EXPLAIN ANALYZE):

/*
* Start progressive explain if enabled.
*/
if (progressive_explain)
  ProgressiveExplainBegin(queryDesc);

- The incremental plan print for instrumented runs uses a dedicated
ExecProcNode if progressive_explain is enabled:

if (node->instrument)
  if (progressive_explain)
    node->ExecProcNode = ExecProcNodeInstrExplain;
  else
    node->ExecProcNode = ExecProcNodeInstr;
else
  node->ExecProcNode = node->ExecProcNodeReal;

- ExecProcNodeInstrExplain is identical to ExecProcNodeInstr with an
additional part to print plans based on a sampling logic:

/*
* Update progressive explain based on sampling.
*/
if (pg_prng_double(&pg_global_prng_state) < progressive_explain_sample_rate)
  ProgressiveExplainUpdate(node);

That logic was added because ExecProcNodeInstrExplain is called once per
row processed (a lot of times) and performing the timestamp interval
check with progressive_explain_interval to decide whether to print
the plan (done inside ProgressiveExplainUpdate) is expensive. Benchmarks
(shared at the end of this email) show that sampling + timestamp check
gives much better results than performing the timestamp check at every
ExecProcNodeInstrExplain call.

- The plans are stored in a shared hash object (explainArray) allocated
at database start, similar to procArray. ExplainHashShmemSize() computes
shared memory needed for it, based on max_connections + max_parallel_workers
for the amount of elements in the array and progressive_explain_output_size
for the size per element.

- A memory context release callback is configured in the memory context
where the query is running, being responsible for updating explainArray
even when the query doesn't finish gracefully.

- Instrumented plans being printed incrementally need to clone
instrumentation
objects to change them, so each print uses a dedicated memory context
that gets released after the output is constructed. This avoids extended
private memory usage:


/* Dedicated memory context for the current plan being printed */
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
  "Progressive Explain Temporary Context",
  ALLOCSET_DEFAULT_SIZES);

- A new version of InstrEndLoop (InstrEndLoopForce) was created that allows
to be called targeting in-progress instrumented objects. Those are common
when traversing the plan tree of an active query.

- Column explain from pg_stat_progress_explain can only be visualized by
superusers or the same role that is running the query. If none of those
conditions are met, users will see "<insufficient privilege>".

- For instrumented runs, the printed includes 2 per node modifiers when
appropriate:

  <current>: the plan node currently being processed.
  <never executed>: a plan node not processed yet.

IMPLEMENTATION OVERHEAD:

When not used, the overhead added is:

- One IF at standard_ExecutorStart to check if progressive_explain is
enabled
- For instrumented runs (EXPLAIN ANALYZE), one IF at ExecProcNodeFirst
to define ExecProcNode wrapper

BENCHMARKS:

Performed 3 scenarios of benchmarks:

A) Comparison between unpatched PG18, patched with progressive explain
disabled and patched with feature enabled globally (all queries printing
the plan at query start:

- PG18 without patch:

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2173978
latency average = 1.655 ms
tps = 18127.977529 (without initial connection time)

- PG18 with patch:

-- progressive_explain = off

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2198806
latency average = 1.636 ms
tps = 18333.157809 (without initial connection time)

-- progressive_explain = on (prints plan only once per query)

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -S -n -T 120 -c 30
number of transactions actually processed: 2047459
latency average = 1.756 ms
tps = 17081.477199 (without initial connection time)

B) EXPLAIN ANALYZE performance with different progressive_explain_interval
settings in patched:

-- progressive_explain = off

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms

-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms

-- progressive_explain = on
-- progressive_explain_interval = 10ms
-- progressive_explain_sample_rate = 0.01 (default)

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 300 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4785.608 ms

C) EXPLAIN ANALYZE performance in patched with and without
progressive_explain_sample_rate, ie, sampling with 2 different values
and also no sampling logic:

-- progressive_explain = off

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 27
latency average = 4492.845 ms

-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 0.01 (default)

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 26
latency average = 4656.067 ms

-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- progressive_explain_sample_rate = 1

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 19
latency average = 6432.902 ms

-- progressive_explain = on
-- progressive_explain_interval = 1s (default)
-- NO SAMPLING LOGIC

postgres@ip-172-31-39-191:~$ /usr/local/pgsql/bin/pgbench -n -T 120 -c 1 -f
script.sql
number of transactions actually processed: 21
latency average = 5864.820 ms

BENCHMARK RESULTS:

It definitely needs more testing but preliminary results show that:

- From (A) we see that the patch adds negligible overhead when the
feature is not used. Enabling globally reduces overall TPS as all
queries are spending time printing the plan. The idea is to enable
progressive_explain on a per-need basis, only to a subset of sessions
that need it.

- From (B) we see that using progressive explain slightly increases
total execution time. Difference between using progressive_explain_interval
set to 1s (plan printed 4 times per query in the test) and to
10ms (plan printed ~460 times per query in the test) is very small.
The actual overhead appears when changing progressive_explain_sample_rate.

- From (C) we see that progressive_explain_sample_rate with a low
value (default 0.01) performs better than not using sampling or
using progressive_explain_sample_rate = 1. So the overhead of having
the sampling logic is much lower than not sampling at all.

TESTS:

Currently working on tests for a second version of the patch.

DOCUMENTATION:

Added documentation for the new view pg_stat_progress_explain,
new GUCs and a new item in section 14.1:

14.1. Using EXPLAIN
  14.1.1. EXPLAIN Basics
  14.1.2. EXPLAIN ANALYZE
  14.1.3. Caveats
  14.1.4. Progressive EXPLAIN

FURTHER DISCUSSION:

Considering that this patch introduces a new major feature with
several new components (view, GUCs, etc), there is open room for
discussion such as:

- Do the columns in pg_stat_progress_explain make sense? Are we
missing or adding unnecessary columns?

- Do the new GUCs make sense and are their default values appropriate?

- Do we want progressive explain to print plans of regular queries
started without EXPLAIN if progressive_explain is enabled or should
the feature be restricted to instrumented queries (EXPLAIN ANALYZE)?

- Is the size of explainHash based on max_connections + max_parallel_workers
large enough or are there other types of backends that use the
executor and will print plans too?

Regards,

Rafael Castro.

v1-0001-Proposal-for-progressive-explains.patch
Description: Binary data

Proposal: Progressive explain

Reply via email to