Hi All, I've organized the discussion so far and improved the patch. The issues and their status are below.
* Issues and status (or comment):
I've numbered them for ease of management.
====
T6. Changed option to show extended statistics (from VERBOSE to STATS)
-> Done already on the previous patch
T4. Add new node (to resolve errors in cfbot and prepared statement)
-> Done on the attached patch maybe
M2. Handle VERBOSE option more correctly
-> Done on the attache patch
M4. Organize the specifications for the types of ExtStats to display
-> Done. I understood the specifications (restrictions) below and wrote
it
on explain.sgml.
- depndencied or mcv of extended stats are displayed
- ndistinct is excluded
T9. Add log_stats option to auto_explain
-> Done on the attached patches
I2. Add tab completion to psql
-> Done on the attached patches
T7. Add documents
-> Partially completed.
- Done:
- doc/src/sgml/ref/explain.sgml
- doc/src/sgml/auto-explain.sgml
- Not started yet:
- doc/src/sgml/ref/create_statistics.sgml
- doc/src/sgml/perform.sgml
- doc/src/sgml/planstats.sgml
T8. Add regression test (stats_ext.sql)
-> Work in progress.
M5. Extstats showing twice issue
-> Work in progress.
The problem is related to Partial/Final groupby or Parallel query.
I found that HashParallel is one of the causes to show it twice.
(See result.txt)
I1. Fix error when query has nested OR condition
-> Not started yet.
As a starting point, it's good to understand the difference
between maybe_extract_actual_clauses() and extract_actual_clauses(),
right?
T5. Measure overhead of the feature
-> Not started yet.
But I understood a conditions to measure overhead:
- simple query without no extended stats
- simple query with 100 extended stats
====
* Regarding to the attached patches:
This patch is WIP patch including the following fixes:
====
- 0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch
- Rebased on 9926f854
- Added a new struct Applied_ExtStats in plannode.h (T4)
- Hopefully this will solve the issue related to
"-DCOPY_PARSE_PLAN_TREES"
with cfbot.
- To pass the extended statistics information from path to plan,
it might be more appropriate to define a new structure in
primnode.h
rather than plannode.h. Any advice would be appreciated.
- Handled EXPLAIN(STATS, VERBOSE) option (M2)
- Before the fix, schema name was always added to extended
statistics name,
but with this patch, schema name is added to the following only
when
VERBOSE option is selected:
- Extended statistics name, table name, and column name
- Added Supported extended statistics types in document (M4)
- 0002-Add-a-new-option-auto_explain.log_stats.patch
- It allows to use auto_explain.log_stats option on auto_explain
- 0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch
- When you run "EXPLAIN (<tab>" or "EXPLAIN (S<tab>" on psql,
"STATS" string is displayed.
=====
Finally, the attached test.sql and result.txt are a test case (incomplete)
and
its results. When the test case is completed, it will be merged into
ext_stats.sql.
Any advice is welcome!
Regards,
Tatsuro Yamada
\set PROMPT1 '%[%033[1m%]%M %n@%/:%>%R%[%033[0m%]%# '
\set HISTFILE ~/.psql_history- :PORT
-- This function checks which clauses have extended statistics applied
create or replace function check_clause(text, text) returns table (clause text,
extstats text)
language plpgsql as
$$
declare
sql text;
ln text;
tmp text[];
c text;
s text;
begin
if ($1 is not null) then
sql := format('explain (stats, costs off, %s) %s', $1, $2);
else
sql := format('explain (stats, costs off) %s', $2);
end if;
for ln in
execute sql
loop
tmp := regexp_match(ln, 'Filter: .*|Group Key: .*');
if tmp is not null then
c = tmp[1]::text;
end if;
tmp := regexp_match(ln, 'Ext Stats: (.*)');
if tmp is not null then
s = tmp[1]::text;
end if;
if (c is not null) and (s is not null) then
return query select c, s;
c := null;
s := null;
end if;
end loop;
end;
$$;
CREATE FUNCTION
-- prepare
create table t (a int, b int);
CREATE TABLE
insert into t select mod(i, 10), mod(i, 10) from generate_series(1, 100000)
s(i);
INSERT 0 100000
-- without extended stats
-- where
EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1;
QUERY PLAN
---------------------------------
Seq Scan on t
Filter: ((a = 1) AND (b = 1))
(2 rows)
SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1');
clause | extstats
--------+----------
(0 rows)
-- group by
EXPLAIN (stats, costs off) select 1 from t group by a, b;
QUERY PLAN
---------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t
(3 rows)
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
--------+----------
(0 rows)
-- create extended stats
create statistics s on a, b from t;
CREATE STATISTICS
analyze t;
ANALYZE
\dX
List of extended statistics
Schema | Name | Definition | Ndistinct | Dependencies | MCV
--------+------+-------------+-----------+--------------+---------
public | s | a, b FROM t | defined | defined | defined
(1 row)
-- show extended stats without verbose option
-- where
-- EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1;
SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1');
clause | extstats
-------------------------------+-----------------------------------
Filter: ((a = 1) AND (b = 1)) | s Clauses: ((a = 1) AND (b = 1))
(1 row)
-- group by
-- EXPLAIN (stats, costs off) select 1 from t group by a, b;
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
-----------------+------------------
Group Key: a, b | s Clauses: a, b
(1 row)
-- use verbose option
-- where
SELECT * FROM check_clause('verbose', 'select * from t where a = 1 and b = 1');
clause | extstats
-----------------------------------+----------------------------------------------
Filter: ((t.a = 1) AND (t.b = 1)) | public.s Clauses: ((t.a = 1) AND (t.b =
1))
(1 row)
-- group by
SELECT * FROM check_clause('verbose', 'select * from t group by a, b');
clause | extstats
---------------------+-----------------------------
Group Key: t.a, t.b | public.s Clauses: t.a, t.b
(1 row)
-- display Ext Stats: twice?!
-- HashAgg
EXPLAIN (stats, costs off) select 1 from t group by a, b;
QUERY PLAN
-------------------------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t
Ext Stats: s Clauses: a, b
(4 rows)
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
-----------------+------------------
Group Key: a, b | s Clauses: a, b
(1 row)
-- GroupAgg
set enable_hashagg to off;
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
QUERY PLAN
-------------------------------------------
Group
Group Key: a, b
-> Sort
Sort Key: a, b
-> Seq Scan on t
Ext Stats: s Clauses: a, b
(6 rows)
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
-----------------+------------------
Group Key: a, b | s Clauses: a, b
(1 row)
set enable_hashagg to on;
SET
-- Parallel
-- shows twice
set min_parallel_table_scan_size to '0';
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
QUERY PLAN
-------------------------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t
Ext Stats: s Clauses: a, b
Ext Stats: s Clauses: a, b
(5 rows)
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
-----------------+------------------
Group Key: a, b | s Clauses: a, b
(1 row)
set min_parallel_table_scan_size to default;
SET
-- Parallel Hash
-- This is the reason to show ExtStats twice
set enable_parallel_hash to off;
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
QUERY PLAN
-------------------------------------
HashAggregate
Group Key: a, b
-> Seq Scan on t
Ext Stats: s Clauses: a, b
(4 rows)
SELECT * FROM check_clause(null, 'select * from t group by a, b');
clause | extstats
-----------------+------------------
Group Key: a, b | s Clauses: a, b
(1 row)
set enable_parallel_hash to default;
SET
-- prepare statement: not display Ext Stats because Topmost command is EXECUTE,
-- so isExplain_stats flag seems false
PREPARE ps (INT, INT) AS SELECT * FROM t WHERE a = $1 AND b = $2;
PREPARE
EXPLAIN(stats, costs on, analyze) EXECUTE ps(5, 5);
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..1943.00 rows=10033 width=8) (actual
time=0.024..13.807 rows=10000 loops=1)
Filter: ((a = 5) AND (b = 5))
Rows Removed by Filter: 90000
Buffers: shared hit=443
Planning Time: 0.092 ms
Execution Time: 14.271 ms
(6 rows)
SELECT * FROM check_clause(null, 'EXECUTE ps(5, 5)');
clause | extstats
--------+----------
(0 rows)
-- crean up
DROP TABLE t;
DROP TABLE
DROP FUNCTION check_clause(text, text);
DROP FUNCTION
\dX
List of extended statistics
Schema | Name | Definition | Ndistinct | Dependencies | MCV
--------+------+------------+-----------+--------------+-----
(0 rows)
0002-Add-a-new-option-auto_explain.log_stats.patch
Description: Binary data
test.sql
Description: Binary data
0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch
Description: Binary data
0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch
Description: Binary data
