Hi All, I've organized the discussion so far and improved the patch. The issues and their status are below.
* Issues and status (or comment): I've numbered them for ease of management. ==== T6. Changed option to show extended statistics (from VERBOSE to STATS) -> Done already on the previous patch T4. Add new node (to resolve errors in cfbot and prepared statement) -> Done on the attached patch maybe M2. Handle VERBOSE option more correctly -> Done on the attache patch M4. Organize the specifications for the types of ExtStats to display -> Done. I understood the specifications (restrictions) below and wrote it on explain.sgml. - depndencied or mcv of extended stats are displayed - ndistinct is excluded T9. Add log_stats option to auto_explain -> Done on the attached patches I2. Add tab completion to psql -> Done on the attached patches T7. Add documents -> Partially completed. - Done: - doc/src/sgml/ref/explain.sgml - doc/src/sgml/auto-explain.sgml - Not started yet: - doc/src/sgml/ref/create_statistics.sgml - doc/src/sgml/perform.sgml - doc/src/sgml/planstats.sgml T8. Add regression test (stats_ext.sql) -> Work in progress. M5. Extstats showing twice issue -> Work in progress. The problem is related to Partial/Final groupby or Parallel query. I found that HashParallel is one of the causes to show it twice. (See result.txt) I1. Fix error when query has nested OR condition -> Not started yet. As a starting point, it's good to understand the difference between maybe_extract_actual_clauses() and extract_actual_clauses(), right? T5. Measure overhead of the feature -> Not started yet. But I understood a conditions to measure overhead: - simple query without no extended stats - simple query with 100 extended stats ==== * Regarding to the attached patches: This patch is WIP patch including the following fixes: ==== - 0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch - Rebased on 9926f854 - Added a new struct Applied_ExtStats in plannode.h (T4) - Hopefully this will solve the issue related to "-DCOPY_PARSE_PLAN_TREES" with cfbot. - To pass the extended statistics information from path to plan, it might be more appropriate to define a new structure in primnode.h rather than plannode.h. Any advice would be appreciated. - Handled EXPLAIN(STATS, VERBOSE) option (M2) - Before the fix, schema name was always added to extended statistics name, but with this patch, schema name is added to the following only when VERBOSE option is selected: - Extended statistics name, table name, and column name - Added Supported extended statistics types in document (M4) - 0002-Add-a-new-option-auto_explain.log_stats.patch - It allows to use auto_explain.log_stats option on auto_explain - 0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch - When you run "EXPLAIN (<tab>" or "EXPLAIN (S<tab>" on psql, "STATS" string is displayed. ===== Finally, the attached test.sql and result.txt are a test case (incomplete) and its results. When the test case is completed, it will be merged into ext_stats.sql. Any advice is welcome! Regards, Tatsuro Yamada
\set PROMPT1 '%[%033[1m%]%M %n@%/:%>%R%[%033[0m%]%# ' \set HISTFILE ~/.psql_history- :PORT -- This function checks which clauses have extended statistics applied create or replace function check_clause(text, text) returns table (clause text, extstats text) language plpgsql as $$ declare sql text; ln text; tmp text[]; c text; s text; begin if ($1 is not null) then sql := format('explain (stats, costs off, %s) %s', $1, $2); else sql := format('explain (stats, costs off) %s', $2); end if; for ln in execute sql loop tmp := regexp_match(ln, 'Filter: .*|Group Key: .*'); if tmp is not null then c = tmp[1]::text; end if; tmp := regexp_match(ln, 'Ext Stats: (.*)'); if tmp is not null then s = tmp[1]::text; end if; if (c is not null) and (s is not null) then return query select c, s; c := null; s := null; end if; end loop; end; $$; CREATE FUNCTION -- prepare create table t (a int, b int); CREATE TABLE insert into t select mod(i, 10), mod(i, 10) from generate_series(1, 100000) s(i); INSERT 0 100000 -- without extended stats -- where EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1; QUERY PLAN --------------------------------- Seq Scan on t Filter: ((a = 1) AND (b = 1)) (2 rows) SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1'); clause | extstats --------+---------- (0 rows) -- group by EXPLAIN (stats, costs off) select 1 from t group by a, b; QUERY PLAN --------------------- HashAggregate Group Key: a, b -> Seq Scan on t (3 rows) SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats --------+---------- (0 rows) -- create extended stats create statistics s on a, b from t; CREATE STATISTICS analyze t; ANALYZE \dX List of extended statistics Schema | Name | Definition | Ndistinct | Dependencies | MCV --------+------+-------------+-----------+--------------+--------- public | s | a, b FROM t | defined | defined | defined (1 row) -- show extended stats without verbose option -- where -- EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1; SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1'); clause | extstats -------------------------------+----------------------------------- Filter: ((a = 1) AND (b = 1)) | s Clauses: ((a = 1) AND (b = 1)) (1 row) -- group by -- EXPLAIN (stats, costs off) select 1 from t group by a, b; SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats -----------------+------------------ Group Key: a, b | s Clauses: a, b (1 row) -- use verbose option -- where SELECT * FROM check_clause('verbose', 'select * from t where a = 1 and b = 1'); clause | extstats -----------------------------------+---------------------------------------------- Filter: ((t.a = 1) AND (t.b = 1)) | public.s Clauses: ((t.a = 1) AND (t.b = 1)) (1 row) -- group by SELECT * FROM check_clause('verbose', 'select * from t group by a, b'); clause | extstats ---------------------+----------------------------- Group Key: t.a, t.b | public.s Clauses: t.a, t.b (1 row) -- display Ext Stats: twice?! -- HashAgg EXPLAIN (stats, costs off) select 1 from t group by a, b; QUERY PLAN ------------------------------------- HashAggregate Group Key: a, b -> Seq Scan on t Ext Stats: s Clauses: a, b (4 rows) SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats -----------------+------------------ Group Key: a, b | s Clauses: a, b (1 row) -- GroupAgg set enable_hashagg to off; SET EXPLAIN (stats, costs off) select 1 from t group by a, b; QUERY PLAN ------------------------------------------- Group Group Key: a, b -> Sort Sort Key: a, b -> Seq Scan on t Ext Stats: s Clauses: a, b (6 rows) SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats -----------------+------------------ Group Key: a, b | s Clauses: a, b (1 row) set enable_hashagg to on; SET -- Parallel -- shows twice set min_parallel_table_scan_size to '0'; SET EXPLAIN (stats, costs off) select 1 from t group by a, b; QUERY PLAN ------------------------------------- HashAggregate Group Key: a, b -> Seq Scan on t Ext Stats: s Clauses: a, b Ext Stats: s Clauses: a, b (5 rows) SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats -----------------+------------------ Group Key: a, b | s Clauses: a, b (1 row) set min_parallel_table_scan_size to default; SET -- Parallel Hash -- This is the reason to show ExtStats twice set enable_parallel_hash to off; SET EXPLAIN (stats, costs off) select 1 from t group by a, b; QUERY PLAN ------------------------------------- HashAggregate Group Key: a, b -> Seq Scan on t Ext Stats: s Clauses: a, b (4 rows) SELECT * FROM check_clause(null, 'select * from t group by a, b'); clause | extstats -----------------+------------------ Group Key: a, b | s Clauses: a, b (1 row) set enable_parallel_hash to default; SET -- prepare statement: not display Ext Stats because Topmost command is EXECUTE, -- so isExplain_stats flag seems false PREPARE ps (INT, INT) AS SELECT * FROM t WHERE a = $1 AND b = $2; PREPARE EXPLAIN(stats, costs on, analyze) EXECUTE ps(5, 5); QUERY PLAN ------------------------------------------------------------------------------------------------------- Seq Scan on t (cost=0.00..1943.00 rows=10033 width=8) (actual time=0.024..13.807 rows=10000 loops=1) Filter: ((a = 5) AND (b = 5)) Rows Removed by Filter: 90000 Buffers: shared hit=443 Planning Time: 0.092 ms Execution Time: 14.271 ms (6 rows) SELECT * FROM check_clause(null, 'EXECUTE ps(5, 5)'); clause | extstats --------+---------- (0 rows) -- crean up DROP TABLE t; DROP TABLE DROP FUNCTION check_clause(text, text); DROP FUNCTION \dX List of extended statistics Schema | Name | Definition | Ndistinct | Dependencies | MCV --------+------+------------+-----------+--------------+----- (0 rows)
0002-Add-a-new-option-auto_explain.log_stats.patch
Description: Binary data
test.sql
Description: Binary data
0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch
Description: Binary data
0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch
Description: Binary data