Re: pg_stat_advisor extension

2024-05-13 Thread Ilia Evdokimov
1. In the case of parallel workers the plan_rows value has a different semantics than the number of rows predicted. Just explore 

get_parallel_divisor().

2. The extension recommends new statistics immediately upon an error finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.


The new parameter, `pg_stat_advisor.analyze_scale_factor`, can suggest 
the execution of the ANALYZE command on specific tables. The extension 
now evaluates the ratio of `n_live_tup` (number of live tuples) to 
`n_mod_since_analyze` (number of modifications since last analyze) in 
the `pg_stat_all_tables` catalog. If this ratio exceeds the value 
specified in `analyze_scale_factor`, the extension will suggest an 
update to the table's statistics.


There are a lot of parameters that influences on estimated rows. 
Statistics might not help improve estimated rows. This feature is 
designed to provide users with data-driven insights to decide whether 
updating statistics via the ANALYZE command could potentially improve 
query performance. By suggesting rather than automatically executing 
statistics updates, we empower you to make informed decisions based on 
the specific needs and conditions of your database environment.


I've developed an extension that provides suggestions on whether to 
update or create statistics for your PostgreSQL database, without 
executing any changes. This approach allows you to consider various 
parameters that influence row estimates and make informed decisions 
about optimizing your database's performance.


Your feedback is invaluable, and we look forward to hearing about your 
experiences and any improvements you might suggest. Best regards, Ilia 
Evdokimov Tantor Labs LLC.
From eb998bea96a3640d240afa63e08cc8cf98925bf7 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Mon, 13 May 2024 14:41:59 +0300
Subject: [PATCH] 'pg_stat_advisor' extension

This service as a hook into executor. It has two GUC-parameters.

pg_stat_advisor.analyze_scale_factor: if ratio of pg_stat_all_tables.n_live_tup to pg_stat_all_tables.n_mod_since_analyze is greater than pg_stat_advisor.analyze_scale_factor extension prints suggestion executing ANALYZE command.

pg_stat_advisor.suggest_statistics_threshold: the ratio of total rows to planned rows is greater than or equal to this threshold the extension prints suggestion executing the creation of statistics, using the naming format 'relationName_columns'
---
 contrib/Makefile  |   1 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 .../expected/pg_stat_advisor.out  |  52 ++
 contrib/pg_stat_advisor/meson.build   |  30 +
 contrib/pg_stat_advisor/pg_stat_advisor.c | 560 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  24 +
 6 files changed, 687 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index abd780f277..d6ce2fe562 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -33,6 +33,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/expected/pg_stat_advisor.out b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
new file mode 100644
index 00..8f3dab2c2f
--- /dev/null
+++ b/contrib/pg_stat_advisor/expected/pg_stat_advisor.out
@@ -0,0 +1,52 @@
+LOAD 'pg_stat_advisor';
+SET pg_stat_advisor.analyze_scale_factor = 0.4;
+SET pg_stat_advisor.suggest_statistics_threshold = 0.11;
+CREATE TABLE my_tbl(fld_1 INTEGER, fld_2 BIGINT) WITH (autovacuum_enabled = false);
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/100 as fld_1,
+ i/500 as fld_2
+FROM generate_series(1, 1000) s(i);
+ANALYZE my_tbl;
+INSERT INTO my_tbl (fld_1, fld_2)
+SELECT
+ i/10

Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov

On Feb 08 2024 at 07:14:18, Andrei Lepikhov wrote:

1. In the case of parallel workers the plan_rows value has a different 
semantics than the number of rows predicted. Just explore 
get_parallel_divisor().
Yes, this is a very weighty and important issue. I need to think about 
this very carefully.
2. The extension recommends new statistics immediately upon an error 
finding. But what if the reason for the error is stale statistics? Or 
this error may be raised for only one specific set of constants, and 
estimation will be done well in another 99.% of cases for the same 
expression.
According to No.2, it might make sense to collect and track clause 
combinations and cardinality errors found and let the DBA make decisions 
on their own.
Your proposal is very interesting. In my opinion, it is worth 
considering updating the extended statistics if they are truly stale. 
And write about this in a separate message with suggestion updating 
statistics.
If I succeed, then in the next patch I will add the kind of extended 
statistics to the message, deal with the parallel workers and update 
statistics if necessary.
If you have additional suggestions and thoughts, feel free to write them 
in this thread.



Regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov
Our further discussion of this new extension takes place in this thread: https://www.postgresql.org/message-id/flat/f822b674-9697-43b9-931b-4d69729a26ff%40tantorlabs.com .Due to technical difficulties in the current thread, I will not be able to conduct a dialogue except in HTML format. And this will make it inconvenient for everyone to read the messages. I apologize for the inconvenience caused. Regards, Ilia Evdokimov, TantorLabs LLC.




Re: pg_stat_advisor extension

2024-02-08 Thread Ilia Evdokimov

On Feb 8 2024 at 00:00:00 jian he

>INT MAX

>should be 1.0?

I don’t know why Konstantin Knizhnik used the ratio of actual tuples to 
the planned ones, but most who start testing my extension expect that it 
will be a coefficient from 0 to 1, which will be the ratio of the 
estimated tuples to the actual ones. Therefore, I changed the value of 
this coefficient the other way around and now the value can be from 0 to 
1. The patch with changes has been attached.



> now CREATE STATISTICS, the statistics name is optional

I constructed the name of the statistics so that the user could copy the 
line with 'CREATE STATISTICS' with the mouse and execute this command 
faster. But if the user wants ITS name, he can do it manually.



> here you can explicitly mention the statistics kind would be great

I agree with you. That would be my next step. That's why I'm doing it now.


> Also since the documentation is limited, more comments 
explainingSuggestMultiColumnStatisticsForNode would be great.



overall the comments are very little, it should be more (that's my opinion).


Yes, certainly. I'll do it in the next patch.

I'm looking forward to your thoughts and feedback.

Regards,

Ilia Evdokimov,

Tantor Labs LLC.
From f87f4a57e532d57f43dab4764d08ddf83d9f3d8f Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Thu, 8 Feb 2024 16:00:57 +0300
Subject: [PATCH] 'pg_stat_advisor' extension.

This serves as a hook into the executor for determining total rows and planned rows.
The process starts by checking the `pg_stat_advisor.suggest_statistics_threshold` GUC parameter.
If it's set to 0.0, the extension does not proceed further. When the parameter is greater than 0.0,
the extension evaluates the accuracy of planned rows. A suggestion for creating statistics is made
if the ratio of total rows to planned rows is greater than or equal to this threshold.

Only then does it extract the relation and columns from the query.
The extension checks pg_statistic_ext for existing relevant statistics. If no statistics are found,
it prints a notice suggesting the creation of statistics, using the naming format 'relationName_columns'.

Author: Ilia Evdokimov
---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 +++
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 482 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 768 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/README.md b/contrib/pg_stat_advisor/README.md
new file mode 100644
index 00..3f4e97f195
--- /dev/null
+++ b/contrib/pg_stat_advisor/README.md
@@ -0,0 +1,85 @@
+## pg_stat_adv

Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov

Dear Team,

Firstly, I would like to extend my sincere apologies for the confusion 
and technical oversights in our previous discussions regarding the 
'pg_stat_advisor extension'. To address this and facilitate a clearer, 
more focused dialogue, I have initiated a new thread to consolidate our 
discussions on this matter.


For context, our previous conversation can be found here: 
https://www.postgresql.org/message-id/flat/4681151706615977%40mail.yandex.ru.


The extension 'pg_stat_advisor' extension is architected to optimize 
query plan. It operates by suggesting when to create extended 
statistics, particularly in queries where current selectivity estimates 
fall short. This is achieved through the GUC parameter 
'pg_stat_advisor.suggest_statistics_threshold', which assesses the ratio 
of total tuples compared to the planned rows. This feature is 
instrumental in identifying scenarios where the planner's estimates 
could be optimized.


You can install the extension by:

```

LOAD 'pg_stat_advisor'

SET pg_stat_advisor.suggest_statistics_threshold = 1.0;

```


Example:

```

EXPLAIN ANALYZE SELECT * FROM t WHERE i = 100 AND j = 10;

NOTICE: pg_stat_advisor suggestion: CREATE STATISTICS t_i_j ON i, j FROM t

    QUERY PLAN



```

After EXPLAIN ANALYZE command you can see the message of suggestion 
creating statistics with name 't_i_j' on 'i', 'j' columns from 't' table.


Thank you for your understanding, patience, and continued support.



Best regards,
Ilia Evdokimov,
Tantor Labs LLC.




Re: pg_stat_advisor extension

2024-02-06 Thread Ilia Evdokimov

Hi hackers,


I've encountered and addressed errors in the 
"0001-pg_stat_advisor-extension.patch" when applying it to the main 
branch, specifically trailing whitespace issues at lines 117 and 118:


```
0001-pg_stat_advisor-extension.patch:117: trailing whitespace.
   QUERY PLAN
0001-pg_stat_advisor-extension.patch:118: trailing whitespace.

warning: 2 lines add whitespace errors.

```

An updated patch is attached for review


I welcome your insights, feedback, and evaluations regarding the 
necessity of integrating this new extension into PostgreSQL.



Kind regards,

Ilia Evdokimov,

Tantor Labs LLC.
From 6316706c42996219e507bb6ded9dd1e872180e38 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov 
Date: Tue, 6 Feb 2024 18:11:04 +0300
Subject: [PATCH] pg_stat_advisor extension

---
 contrib/Makefile  |   1 +
 contrib/meson.build   |   1 +
 contrib/pg_stat_advisor/.gitignore|   3 +
 contrib/pg_stat_advisor/Makefile  |  20 +
 contrib/pg_stat_advisor/README.md |  85 
 .../expected/pg_stat_advisor.out  |  96 
 contrib/pg_stat_advisor/meson.build   |  30 ++
 contrib/pg_stat_advisor/pg_stat_advisor.c | 477 ++
 .../pg_stat_advisor/sql/pg_stat_advisor.sql   |  50 ++
 9 files changed, 763 insertions(+)
 create mode 100644 contrib/pg_stat_advisor/.gitignore
 create mode 100644 contrib/pg_stat_advisor/Makefile
 create mode 100644 contrib/pg_stat_advisor/README.md
 create mode 100644 contrib/pg_stat_advisor/expected/pg_stat_advisor.out
 create mode 100644 contrib/pg_stat_advisor/meson.build
 create mode 100644 contrib/pg_stat_advisor/pg_stat_advisor.c
 create mode 100644 contrib/pg_stat_advisor/sql/pg_stat_advisor.sql

diff --git a/contrib/Makefile b/contrib/Makefile
index da4e2316a3..da9a4ceeaa 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
 		pg_buffercache	\
 		pg_freespacemap \
 		pg_prewarm	\
+		pg_stat_advisor \
 		pg_stat_statements \
 		pg_surgery	\
 		pg_trgm		\
diff --git a/contrib/meson.build b/contrib/meson.build
index c12dc906ca..a20d99443b 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -49,6 +49,7 @@ subdir('pgcrypto')
 subdir('pg_freespacemap')
 subdir('pg_prewarm')
 subdir('pgrowlocks')
+subdir('pg_stat_advisor')
 subdir('pg_stat_statements')
 subdir('pgstattuple')
 subdir('pg_surgery')
diff --git a/contrib/pg_stat_advisor/.gitignore b/contrib/pg_stat_advisor/.gitignore
new file mode 100644
index 00..913175ff6e
--- /dev/null
+++ b/contrib/pg_stat_advisor/.gitignore
@@ -0,0 +1,3 @@
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_stat_advisor/Makefile b/contrib/pg_stat_advisor/Makefile
new file mode 100644
index 00..f31b939e8a
--- /dev/null
+++ b/contrib/pg_stat_advisor/Makefile
@@ -0,0 +1,20 @@
+# contrib/pg_stat_advisor/Makefile
+
+MODULE_big = pg_stat_advisor
+OBJS = \
+	$(WIN32RES) \
+	pg_stat_advisor.o
+PGFILEDESC = "pg_stat_advisor - analyze query performance and recommend the creation of additional statistics"
+
+REGRESS = pg_stat_advisor
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_stat_advisor
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_stat_advisor/README.md b/contrib/pg_stat_advisor/README.md
new file mode 100644
index 00..f9610f2ed5
--- /dev/null
+++ b/contrib/pg_stat_advisor/README.md
@@ -0,0 +1,85 @@
+## pg_stat_advisor - PostgreSQL advisor to create statistics
+
+pg_stat_advisor is a PostgreSQL extension designed to analyze query performance and recommend the creation of additional statistics to improve query plan.
+
+Append pg_stat_advisor to shared_preload_libraries configuration parameter in your postgresql.conf file then restart the PostgreSQL database to apply the changes. Or you can use "LOAD 'pg_stat_advisor';"command
+```
+LOAD 'pg_stat_advisor';
+```
+
+There is the pg_stat_advisor.suggest_statistics_threshold GUC that can be used to set a suggest_statistics_threshold. It is the the ratio of total tuples produced compared to the planned rows. If parameter is set by 0, the printing switches off.
+
+For example:
+```
+SET pg_stat_advisor.suggest_statistics_threshold = 1.0;
+```
+
+Examples:
+
+
+```
+postgres=# create table t (i int, j int);
+CREATE TABLE
+postgres=# insert into t select i/10, i/100 from generate_series(1, 100) i;
+INSERT 0 100
+postgres=# analyze t;
+ANALYZE
+postgres=# explain analyze select * from t where i = 100 and j = 10;
+   QUERY PLAN
+
+--
+--
+ Gather  (cost=1000.00..11675.10 rows=1 width=8) (actual time=0.526..61.564 rows=10 l

Re: pg_stat_advisor extension

2024-02-05 Thread Ilia Evdokimov
Hi hackers, I'm reaching out again regarding the patch with new extension 'pg_stat_advisor' aimed at enhancing query plan efficiency through the suggestion of creating statistics. I understand the community is busy, but I would greatly value any feedback or thoughts on this extension. Thank you for your time and consideration. Best regards,Ilia Evdokimov,Tantor Labs LLC.