Hi All,

I've organized the discussion so far and improved the patch.
The issues and their status are below.

* Issues and status (or comment):

I've numbered them for ease of management.
====
T6. Changed option to show extended statistics (from VERBOSE to STATS)
    -> Done already on the previous patch

T4. Add new node (to resolve errors in cfbot and prepared statement)
    -> Done on the attached patch maybe

M2. Handle VERBOSE option more correctly
    -> Done on the attache patch

M4. Organize the specifications for the types of ExtStats to display
    -> Done. I understood the specifications (restrictions) below and wrote
it
       on explain.sgml.
        - depndencied or mcv of extended stats are displayed
        - ndistinct is excluded

T9. Add log_stats option to auto_explain
    -> Done on the attached patches

I2. Add tab completion to psql
    -> Done on the attached patches

T7. Add documents
    -> Partially completed.
        - Done:
            - doc/src/sgml/ref/explain.sgml
            - doc/src/sgml/auto-explain.sgml
        - Not started yet:
            - doc/src/sgml/ref/create_statistics.sgml
            - doc/src/sgml/perform.sgml
            - doc/src/sgml/planstats.sgml

T8. Add regression test (stats_ext.sql)
    -> Work in progress.

M5. Extstats showing twice issue
    -> Work in progress.
       The problem is related to Partial/Final groupby or Parallel query.
       I found that HashParallel is one of the causes to show it twice.
       (See result.txt)

I1. Fix error when query has nested OR condition
    -> Not started yet.
       As a starting point, it's good to understand the difference
       between maybe_extract_actual_clauses() and extract_actual_clauses(),
right?

T5. Measure overhead of the feature
    -> Not started yet.
       But I understood a conditions to measure overhead:
          - simple query without no extended stats
          - simple query with 100 extended stats
====


* Regarding to the attached patches:

This patch is WIP patch including the following fixes:
====
- 0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch
    - Rebased on 9926f854
    - Added a new struct Applied_ExtStats in plannode.h (T4)
        - Hopefully this will solve the issue related to
"-DCOPY_PARSE_PLAN_TREES"
          with cfbot.
        - To pass the extended statistics information from path to plan,
          it might be more appropriate to define a new structure in
primnode.h
          rather than plannode.h. Any advice would be appreciated.
    - Handled EXPLAIN(STATS, VERBOSE) option (M2)
        - Before the fix, schema name was always added to extended
statistics name,
          but with this patch, schema name is added to the following only
when
          VERBOSE option is selected:
            - Extended statistics name, table name, and column name
    - Added Supported extended statistics types in document (M4)

- 0002-Add-a-new-option-auto_explain.log_stats.patch
    - It allows to use auto_explain.log_stats option on auto_explain

- 0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch
    - When you run "EXPLAIN (<tab>" or "EXPLAIN (S<tab>" on psql,
      "STATS" string is displayed.
=====


Finally, the attached test.sql and result.txt are a test case (incomplete)
and
its results. When the test case is completed, it will be merged into
ext_stats.sql.

Any advice is welcome!

Regards,
Tatsuro Yamada
\set PROMPT1 '%[%033[1m%]%M %n@%/:%>%R%[%033[0m%]%# '
\set HISTFILE ~/.psql_history- :PORT
-- This function checks which clauses have extended statistics applied
create or replace function check_clause(text, text) returns table (clause text, 
extstats text)
language plpgsql as
$$
declare
        sql text;
    ln text;
    tmp text[];
    c text;
    s text;
begin
        if ($1 is not null) then
                sql := format('explain (stats, costs off, %s) %s', $1, $2);
        else
                sql := format('explain (stats, costs off) %s', $2);
        end if;

    for ln in
        execute sql
    loop
        tmp := regexp_match(ln, 'Filter: .*|Group Key: .*');
        if tmp is not null then
            c = tmp[1]::text;
        end if;

        tmp := regexp_match(ln, 'Ext Stats: (.*)');
        if tmp is not null then
            s = tmp[1]::text;
        end if;

        if (c is not null) and (s is not null) then
            return query select c, s;
            c := null;
            s := null;
        end if;
    end loop;
end;
$$;
CREATE FUNCTION
-- prepare
create table t (a int, b int);
CREATE TABLE
insert into t select mod(i, 10), mod(i, 10) from generate_series(1, 100000) 
s(i);
INSERT 0 100000
-- without extended stats
-- where
EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1;
           QUERY PLAN            
---------------------------------
 Seq Scan on t
   Filter: ((a = 1) AND (b = 1))
(2 rows)

SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1');
 clause | extstats 
--------+----------
(0 rows)

-- group by
EXPLAIN (stats, costs off) select 1 from t group by a, b;
     QUERY PLAN      
---------------------
 HashAggregate
   Group Key: a, b
   ->  Seq Scan on t
(3 rows)

SELECT * FROM check_clause(null, 'select * from t group by a, b');
 clause | extstats 
--------+----------
(0 rows)

-- create extended stats
create statistics s on a, b from t;
CREATE STATISTICS
analyze t;
ANALYZE
\dX
                   List of extended statistics
 Schema | Name | Definition  | Ndistinct | Dependencies |   MCV   
--------+------+-------------+-----------+--------------+---------
 public | s    | a, b FROM t | defined   | defined      | defined
(1 row)

-- show extended stats without verbose option
-- where
-- EXPLAIN (stats, costs off) select * from t where a = 1 and b = 1;
SELECT * FROM check_clause(null, 'select * from t where a = 1 and b = 1');
            clause             |             extstats              
-------------------------------+-----------------------------------
 Filter: ((a = 1) AND (b = 1)) | s  Clauses: ((a = 1) AND (b = 1))
(1 row)

-- group by
-- EXPLAIN (stats, costs off) select 1 from t group by a, b;
SELECT * FROM check_clause(null, 'select * from t group by a, b');
     clause      |     extstats     
-----------------+------------------
 Group Key: a, b | s  Clauses: a, b
(1 row)

-- use verbose option
-- where
SELECT * FROM check_clause('verbose', 'select * from t where a = 1 and b = 1');
              clause               |                   extstats                 
  
-----------------------------------+----------------------------------------------
 Filter: ((t.a = 1) AND (t.b = 1)) | public.s  Clauses: ((t.a = 1) AND (t.b = 
1))
(1 row)

-- group by
SELECT * FROM check_clause('verbose', 'select * from t group by a, b');
       clause        |          extstats           
---------------------+-----------------------------
 Group Key: t.a, t.b | public.s  Clauses: t.a, t.b
(1 row)

-- display Ext Stats: twice?!
-- HashAgg
EXPLAIN (stats, costs off) select 1 from t group by a, b;
             QUERY PLAN              
-------------------------------------
 HashAggregate
   Group Key: a, b
   ->  Seq Scan on t
         Ext Stats: s  Clauses: a, b
(4 rows)

SELECT * FROM check_clause(null, 'select * from t group by a, b');
     clause      |     extstats     
-----------------+------------------
 Group Key: a, b | s  Clauses: a, b
(1 row)

-- GroupAgg
set enable_hashagg to off;
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
                QUERY PLAN                 
-------------------------------------------
 Group
   Group Key: a, b
   ->  Sort
         Sort Key: a, b
         ->  Seq Scan on t
               Ext Stats: s  Clauses: a, b
(6 rows)

SELECT * FROM check_clause(null, 'select * from t group by a, b');
     clause      |     extstats     
-----------------+------------------
 Group Key: a, b | s  Clauses: a, b
(1 row)

set enable_hashagg to on;
SET
-- Parallel
-- shows twice
set min_parallel_table_scan_size to '0';
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
             QUERY PLAN              
-------------------------------------
 HashAggregate
   Group Key: a, b
   ->  Seq Scan on t
         Ext Stats: s  Clauses: a, b
         Ext Stats: s  Clauses: a, b
(5 rows)

SELECT * FROM check_clause(null, 'select * from t group by a, b');
     clause      |     extstats     
-----------------+------------------
 Group Key: a, b | s  Clauses: a, b
(1 row)

set min_parallel_table_scan_size to default;
SET
-- Parallel Hash
-- This is the reason to show ExtStats twice
set enable_parallel_hash to off;
SET
EXPLAIN (stats, costs off) select 1 from t group by a, b;
             QUERY PLAN              
-------------------------------------
 HashAggregate
   Group Key: a, b
   ->  Seq Scan on t
         Ext Stats: s  Clauses: a, b
(4 rows)

SELECT * FROM check_clause(null, 'select * from t group by a, b');
     clause      |     extstats     
-----------------+------------------
 Group Key: a, b | s  Clauses: a, b
(1 row)

set enable_parallel_hash to default;
SET
-- prepare statement: not display Ext Stats because Topmost command is EXECUTE,
-- so isExplain_stats flag seems false
PREPARE ps (INT, INT) AS SELECT * FROM t WHERE a = $1 AND b = $2;
PREPARE
EXPLAIN(stats, costs on, analyze) EXECUTE ps(5, 5);
                                              QUERY PLAN                        
                       
-------------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1943.00 rows=10033 width=8) (actual 
time=0.024..13.807 rows=10000 loops=1)
   Filter: ((a = 5) AND (b = 5))
   Rows Removed by Filter: 90000
   Buffers: shared hit=443
 Planning Time: 0.092 ms
 Execution Time: 14.271 ms
(6 rows)

SELECT * FROM check_clause(null, 'EXECUTE ps(5, 5)');
 clause | extstats 
--------+----------
(0 rows)

-- crean up
DROP TABLE t;
DROP TABLE
DROP FUNCTION check_clause(text, text);
DROP FUNCTION
\dX
                 List of extended statistics
 Schema | Name | Definition | Ndistinct | Dependencies | MCV 
--------+------+------------+-----------+--------------+-----
(0 rows)

Attachment: 0002-Add-a-new-option-auto_explain.log_stats.patch
Description: Binary data

Attachment: test.sql
Description: Binary data

Attachment: 0003-Add-a-new-tab-completion-for-EXPLAIN-STATS-on-psql.patch
Description: Binary data

Attachment: 0001-Add-a-new-option-STATS-to-EXPLAIN-command-r3.patch
Description: Binary data

Reply via email to