On 11/9/19 12:33 PM, Mark Dilger wrote:


On 11/6/19 11:58 AM, Tomas Vondra wrote:
On Wed, Nov 06, 2019 at 08:54:40PM +0100, Tomas Vondra wrote:
On Mon, Oct 28, 2019 at 04:20:48PM +0100, Tomas Vondra wrote:
Hi,

PostgreSQL 10 introduced extended statistics, allowing us to consider
correlation between columns to improve estimates, and PostgreSQL 12
added support for MCV statistics. But we still had the limitation that
we only allowed using a single extended statistics per relation, i.e.
given a table with two extended stats

 CREATE TABLE t (a int, b int, c int, d int);
 CREATE STATISTICS s1 (mcv) ON a, b FROM t;
 CREATE STATISTICS s2 (mcv) ON c, d FROM t;

and a query

 SELECT * FROM t WHERE a = 1 AND b = 1 AND c = 1 AND d = 1;

we only ever used one of the statistics (and we considered them in a not
particularly well determined order).

This patch addresses this by using as many extended stats as possible,
by adding a loop to statext_mcv_clauselist_selectivity(). In each step
we pick the "best" applicable statistics (in the sense of covering the
most attributes) and factor it into the oveall estimate.

Tomas,

Your patch compiles and passes the regression tests for me on debian linux under master.

Since your patch does not include modified regression tests, I wrote a test that I expected to improve under this new code, but running it both before and after applying your patch, there is no change.

Ok, the attached test passes before applying your patch and fails afterward owing to the estimates improving and no longer matching the expected output. To be clear, this confirms your patch working as expected.

I haven't seen any crashes in several hours of running different tests, so I think it looks good.


--
Mark Dilger
diff --git a/src/test/regress/expected/stats_ext.out 
b/src/test/regress/expected/stats_ext.out
index b65228fa07..e89865e3ee 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -353,6 +353,48 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM 
ndistinct GROUP BY a, d
        500 |     50
 (1 row)
 
+-- muliple extended statistics
+CREATE TABLE multistats (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+-- Insert unique values
+INSERT INTO multistats (a, b, c, d)
+       SELECT i, i*2, i*3, i*4
+               FROM generate_series(1,10000) s(i);
+-- Duplidate one set of values 10000 times
+INSERT INTO multistats (a, b, c, d)
+       SELECT 0, 0, 0, 0
+               FROM generate_series(1,10000) s(i);
+-- estimates without mcv statistics
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+      1250 |  10000
+(1 row)
+
+-- create some mcv statistics
+CREATE STATISTICS ms_ab (mcv) ON a, b FROM multistats;
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+      2500 |  10000
+(1 row)
+
+-- create some mcv statistics
+CREATE STATISTICS ms_cd (mcv) ON c, d FROM multistats;
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+ estimated | actual 
+-----------+--------
+      2500 |  10000
+(1 row)
+
+DROP TABLE multistats;
 -- functional dependencies tests
 CREATE TABLE functional_dependencies (
     filler1 TEXT,
diff --git a/src/test/regress/sql/stats_ext.sql 
b/src/test/regress/sql/stats_ext.sql
index 040ee97a1e..ba802d7490 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -224,6 +224,41 @@ SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM 
ndistinct GROUP BY b, c
 
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY a, 
d');
 
+-- muliple extended statistics
+CREATE TABLE multistats (
+       a INTEGER,
+       b INTEGER,
+       c INTEGER,
+       d INTEGER
+);
+
+-- Insert unique values
+INSERT INTO multistats (a, b, c, d)
+       SELECT i, i*2, i*3, i*4
+               FROM generate_series(1,10000) s(i);
+-- Duplidate one set of values 10000 times
+INSERT INTO multistats (a, b, c, d)
+       SELECT 0, 0, 0, 0
+               FROM generate_series(1,10000) s(i);
+
+-- estimates without mcv statistics
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+
+-- create some mcv statistics
+CREATE STATISTICS ms_ab (mcv) ON a, b FROM multistats;
+
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+
+-- create some mcv statistics
+CREATE STATISTICS ms_cd (mcv) ON c, d FROM multistats;
+
+ANALYZE multistats;
+SELECT * FROM check_estimated_rows('SELECT * FROM multistats WHERE a = 0 AND b 
= 0 AND c = 0 AND d = 0');
+
+DROP TABLE multistats;
+
 -- functional dependencies tests
 CREATE TABLE functional_dependencies (
     filler1 TEXT,

Reply via email to