understand pg_ndistinct type && Why with(autovacuum_enabled=off) some query estimate 100, some is 200.

jian he Wed, 14 Sep 2022 09:17:26 -0700

source:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/test/regress/expected/stats_ext.out;h=431b3fa3de1f4f87205e7e27a99ef1cf337f1590;hb=676887a3b0b8e3c0348ac3f82ab0d16e9a24bd43


-- n-distinct tests
CREATE TABLE ndistinct (
    filler1 TEXT,
    filler2 NUMERIC,
    a INT,
    b INT,
    filler3 DATE,
    c INT,
    d INT
)
WITH (autovacuum_enabled = off);
-- over-estimates when using only per-column statistics
INSERT INTO ndistinct (a, b, c, filler1)
     SELECT i/100, i/100, i/100, cash_words((i/100)::money)
       FROM generate_series(1,1000) s(i);
ANALYZE ndistinct;
-- Group Aggregate, due to over-estimate of the number of groups
SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b');
 estimated | actual
-----------+--------
       100 |     11
(1 row)

SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY b, c');
 estimated | actual
-----------+--------
       100 |     11
(1 row)

SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b, c');
 estimated | actual
-----------+--------
       100 |     11
(1 row)

SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY a, b, c, d');
 estimated | actual
-----------+--------
       200 |     11
(1 row)

SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct
GROUP BY b, c, d');
 estimated | actual
-----------+--------
       200 |     11
(1 row)

-- correct command
CREATE STATISTICS s10 ON a, b, c FROM ndistinct;
ANALYZE ndistinct;
SELECT s.stxkind, d.stxdndistinct
  FROM pg_statistic_ext s, pg_statistic_ext_data d
 WHERE s.stxrelid = 'ndistinct'::regclass
   AND d.stxoid = s.oid;
 stxkind |                    stxdndistinct
---------+-----------------------------------------------------
 {d,f,m} | {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}
(1 row)

---------------------------------------------------------------

I don't understand the query GROUP BY b, c, d estimate is 200, while  GROUP
BY a, b, c is 100.
I also don't understand the last query stxdndistinct result.
I know what d,f,m refer to.
I may found the pg_ndistinct type source:
https://doxygen.postgresql.org/mvdistinct_8c.html#a03c06f5f0db3fc22cd5323ea04906a7c
But my C knowledge is limited.

Is there any way in sql level to query more info (like base type)
about pg_ndistinct
?

-- 
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian

understand pg_ndistinct type && Why with(autovacuum_enabled=off) some query estimate 100, some is 200.

Reply via email to