Hi hackers,
It's quite common that poor query performance can be attributed to
inaccurate row estimations by the planner. To make it easier to detect
these discrepancies, rather than scrutinizing the estimates manually, it
would be helpful to output a dedicated |NOTICE| message.
In the current patch, I've introduced a new GUC parameter called
'estimated_rows_scale_factor'. If the ratio of the estimated rows to the
actual rows is less than this factor, or if the estimated rows
significantly exceed the actual rows (when the ratio is greater than
this factor), a NOTICE message will be printed. The message reads:
"Estimated rows (%.0f) less(greater) than actual rows (%.0f)."
Here is an example:
CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, x FROM generate_series(1,10000000) g(x);
ANALYZE;
SET estimated_rows_scale_factor = 0.9;
EXPLAIN ANALYZE SELECT * FROM t WHERE a > 10 AND b <= 200;
NOTICE: Estimated rows (1000) greater than actual rows (91).
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..107848.00 rows=1000 width=8) (actual
time=0.446..122.476 rows=91 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..106748.00 rows=417 width=8)
(actual time=77.657..118.000 rows=30 loops=3)
Filter: ((a > 10) AND (b <= 200))
Rows Removed by Filter: 3333303
Planning Time: 0.097 ms
Execution Time: 122.502 ms
(8 rows)
EXPLAIN ANALYZE SELECT * FROM t WHERE a = 10 AND b <= 200;
NOTICE: Estimated rows (1) less than actual rows (10).
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..107748.10 rows=1 width=8) (actual
time=0.280..104.752 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..106748.00 rows=1 width=8)
(actual time=66.493..101.102 rows=3 loops=3)
Filter: ((b <= 200) AND (a = 10))
Rows Removed by Filter: 3333330
Planning Time: 0.129 ms
Execution Time: 104.768 ms
(8 rows)
If you have any suggestions regarding the wording of the message, its
placement, or if you'd like to see a different criterion used, I would
greatly appreciate your feedback.
Looking forward to your thoughts and suggestions.
--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
From 8e038f16d45d5f8c39c7204fc661ed4b7f504a63 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <ilya.evdoki...@tantorlabs.com>
Date: Thu, 5 Sep 2024 18:15:01 +0300
Subject: [PATCH v1] Add message when estimated rows differ from actual rows
If the ratio of estimated rows to actual rows is less
than the 'estimated_rows_scale_factor', or if the estimated rows
exceed the actual rows, a NOTICE message will be printed.
This helps to identify discrepancies between the planner's estimations
and the actual query execution, enabling better analysis and optimization.
---
src/backend/commands/explain.c | 31 +++++++++++++++++++++++++++++
src/backend/utils/misc/guc_tables.c | 11 ++++++++++
src/include/utils/guc.h | 1 +
3 files changed, 43 insertions(+)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 11df4a04d4..da435cfee1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -419,6 +419,35 @@ ExplainResultDesc(ExplainStmt *stmt)
return tupdesc;
}
+/*
+ * Notice whether estimated rows is bad.
+ */
+static void
+ExplainPrintEstimatedRowsAttention(ExplainState *es, QueryDesc *queryDesc)
+{
+ double rows;
+ double nloops;
+ PlanState *planstate = queryDesc->planstate;
+
+ if (es->analyze &&
+ planstate->instrument && planstate->instrument->nloops > 0)
+ {
+ nloops = planstate->instrument->nloops;
+ rows = planstate->instrument->ntuples / nloops;
+
+ if(queryDesc->planstate->plan->plan_rows / rows < estimated_rows_scale_factor)
+ ereport(NOTICE,
+ (errmsg("Estimated rows (%.0f) less than actual rows (%.0f).",
+ queryDesc->planstate->plan->plan_rows,
+ rows), errhidestmt(true)));
+ else if (rows / queryDesc->planstate->plan->plan_rows < estimated_rows_scale_factor)
+ ereport(NOTICE,
+ (errmsg("Estimated rows (%.0f) greater than actual rows (%.0f).",
+ queryDesc->planstate->plan->plan_rows,
+ rows), errhidestmt(true)));
+ }
+}
+
/*
* ExplainOneQuery -
* print out the execution plan for one Query
@@ -721,6 +750,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create textual dump of plan tree */
ExplainPrintPlan(es, queryDesc);
+ ExplainPrintEstimatedRowsAttention(es, queryDesc);
+
/* Show buffer and/or memory usage in planning */
if (peek_buffer_usage(es, bufusage) || mem_counters)
{
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..fd05fcee09 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -528,6 +528,7 @@ int log_parameter_max_length_on_error = 0;
int log_temp_files = -1;
double log_statement_sample_rate = 1.0;
double log_xact_sample_rate = 0;
+double estimated_rows_scale_factor = 1.0;
char *backtrace_functions;
int temp_file_limit = -1;
@@ -3995,6 +3996,16 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"estimated_rows_scale_factor", PGC_SUSET, LOGGING_WHEN,
+ gettext_noop("Estimated rows, as a fraction of actual rows"),
+ NULL
+ },
+ &estimated_rows_scale_factor,
+ 0.0, 0.0, 1.0,
+ NULL, NULL, NULL
+ },
+
/* End-of-list marker */
{
{NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 840b0fe57f..9e676b694f 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -273,6 +273,7 @@ extern PGDLLIMPORT int log_min_duration_sample;
extern PGDLLIMPORT int log_min_duration_statement;
extern PGDLLIMPORT int log_temp_files;
extern PGDLLIMPORT double log_statement_sample_rate;
+extern PGDLLIMPORT double estimated_rows_scale_factor;
extern PGDLLIMPORT double log_xact_sample_rate;
extern PGDLLIMPORT char *backtrace_functions;
--
2.34.1