Re: extended stats on partitioned tables

2022-01-16 Thread Tomas Vondra
I've pushed the part adding stxdinherit flag to the catalog, so that on 
master we build statistics both with and without data from the child 
relations.


I'm not going to push the ACL refactoring. We have similar code on 
various other places (not addressed by the proposed patch), and it'd 
make backpatching harder. So I'm not sure it'd be an improvement.


In any case, I'm going to mark this as committed. Justin, if you think 
we should reconsider the ACL refactoring, please submit it as a separate 
patch. It seems mostly unrelated to the issue this thread was about.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2022-01-15 Thread Tomas Vondra




On 1/15/22 19:35, Tomas Vondra wrote:

On 1/15/22 06:11, Justin Pryzby wrote:

On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote:

1) If the table is a separate relation (not part of an inheritance
tree), this should make no difference. -> OK

2) If the table is using "old" inheritance, this reverts back to
pre-regression behavior. So people will keep using the old statistics
until the ANALYZE, and we need to tell them to ANALYZE or something.

3) If the table is using partitioning, it's guaranteed to be empty and
there are no stats at all. Again, we should tell people to run ANALYZE.


I think these can be mentioned in the commit message, which can end up 
in the

minor release notes as a recommendation to rerun ANALYZE.



Good point. I pushed the 0002 part and added a short paragraph 
suggesting ANALYZE might be necessary. I did not go into details about 
the individual cases, because that'd be too much for a commit message.



Thanks for pushing 0001.



Thanks for posting the patches!

I've pushed the second part - attached are the two remaining parts. I'll 
wait a bit before pushing the rebased 0001 (which goes into master 
branch only). Not sure about 0002 - I'm not convinced the refactored ACL 
checks are an improvement, but I'll think about it.




BTW when backpatching the first part, I had to decide what to do about 
tests. The 10 & 11 branches did not have the check_estimated_rows() 
function. I considered removing the tests, reworking the tests not to 
need the function, or adding the function. Ultimately I added the 
function, which seemed like the best option.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2022-01-15 Thread Tomas Vondra

On 1/15/22 06:11, Justin Pryzby wrote:

On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote:

1) If the table is a separate relation (not part of an inheritance
tree), this should make no difference. -> OK

2) If the table is using "old" inheritance, this reverts back to
pre-regression behavior. So people will keep using the old statistics
until the ANALYZE, and we need to tell them to ANALYZE or something.

3) If the table is using partitioning, it's guaranteed to be empty and
there are no stats at all. Again, we should tell people to run ANALYZE.


I think these can be mentioned in the commit message, which can end up in the
minor release notes as a recommendation to rerun ANALYZE.



Good point. I pushed the 0002 part and added a short paragraph 
suggesting ANALYZE might be necessary. I did not go into details about 
the individual cases, because that'd be too much for a commit message.



Thanks for pushing 0001.



Thanks for posting the patches!

I've pushed the second part - attached are the two remaining parts. I'll 
wait a bit before pushing the rebased 0001 (which goes into master 
branch only). Not sure about 0002 - I'm not convinced the refactored ACL 
checks are an improvement, but I'll think about it.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom e065e54d1961a4e295ebe77febbe6a3307d142b5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 12 Dec 2021 00:07:27 +0100
Subject: [PATCH 1/2] Add stxdinherit flag to pg_statistic_ext_data

The support for extended statistics on inheritance trees was somewhat
problematic, because the catalog pg_statistic_ext_data did not have a
flag identifying whether the data was built with/without data for the
child relations.

For partitioned tables we've been able to work around this because the
non-leaf relations can't contain data, so there's really just one set of
statistics to store. But for regular inheritance trees we had to pick,
and there is no clearly better option - we need both.

This introduces the pg_statistic_ext_data.stxdinherit flag, analogous to
pg_statistic.stainherit, and modifies analyze to build statistics for
both cases.

This relaxes the relationship between the two catalogs - until now we've
created the pg_statistic_ext_data one when creating the statistics, and
then only updated it later. There was always 1:1 relationship between
rows in the two catalogs. With this change we no longer insert any data
rows while creating statistics, because we don't know which flag value
to create, and it seems wasteful to initialize both for every relation.
The there may be 0, 1 or 2 data rows for each statistics definition.

Patch by me, with extensive improvements and fixes by Justin Pryzby.

Author: Tomas Vondra, Justin Pryzby
Reviewed-by: Tomas Vondra, Justin Pryzby
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
---
 doc/src/sgml/catalogs.sgml  |  23 +++
 src/backend/catalog/system_views.sql|   2 +
 src/backend/commands/analyze.c  |  28 +---
 src/backend/commands/statscmds.c|  72 +-
 src/backend/optimizer/util/plancat.c| 147 
 src/backend/statistics/dependencies.c   |  22 ++-
 src/backend/statistics/extended_stats.c |  75 +-
 src/backend/statistics/mcv.c|   9 +-
 src/backend/statistics/mvdistinct.c |   5 +-
 src/backend/utils/adt/selfuncs.c|  37 +
 src/backend/utils/cache/syscache.c  |   6 +-
 src/include/catalog/pg_statistic_ext_data.h |   4 +-
 src/include/commands/defrem.h   |   1 +
 src/include/nodes/pathnodes.h   |   1 +
 src/include/statistics/statistics.h |  11 +-
 src/test/regress/expected/rules.out |   2 +
 src/test/regress/expected/stats_ext.out |  31 +++--
 src/test/regress/sql/stats_ext.sql  |  13 +-
 18 files changed, 254 insertions(+), 235 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 03e2537b07d..2aeb2ef346e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7521,6 +7521,19 @@ SCRAM-SHA-256$iteration count:
created with CREATE STATISTICS.
   
 
+  
+   Normally there is one entry, with stxdinherit =
+   false, for each statistics object that has been analyzed.
+   If the table has inheritance children, a second entry with
+   stxdinherit = true is also created.
+   This row represents the statistics object over the inheritance tree, i.e.,
+   statistics for the data you'd see with
+   SELECT * FROM table*,
+   whereas the stxdinherit = false row
+   represents the results of
+   SELECT * FROM ONLY table.
+  
+
   
Like pg_statistic,
pg_statistic_ext_data should not be
@@ -7560,6 +7573,16 @@ SCRAM-SHA-256$iteration count:
   
  
 
+ 
+  
+   stxdinherit bool
+  
+  
+   If true, the stats include inheritance child columns, not just the

Re: extended stats on partitioned tables

2022-01-14 Thread Justin Pryzby
On Mon, Dec 13, 2021 at 09:40:09PM +0100, Tomas Vondra wrote:
> 1) If the table is a separate relation (not part of an inheritance
> tree), this should make no difference. -> OK
> 
> 2) If the table is using "old" inheritance, this reverts back to
> pre-regression behavior. So people will keep using the old statistics
> until the ANALYZE, and we need to tell them to ANALYZE or something.
> 
> 3) If the table is using partitioning, it's guaranteed to be empty and
> there are no stats at all. Again, we should tell people to run ANALYZE.

I think these can be mentioned in the commit message, which can end up in the
minor release notes as a recommendation to rerun ANALYZE.

Thanks for pushing 0001.

-- 
Justin




Re: extended stats on partitioned tables

2021-12-13 Thread Tomas Vondra
On 12/13/21 14:53, Justin Pryzby wrote:
> On Sun, Dec 12, 2021 at 11:23:19PM +0100, Tomas Vondra wrote:
>> On 12/12/21 22:32, Justin Pryzby wrote:
>>> On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
>>>> The one thing bugging me a bit is that the regression test checks only a
>>>> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
>>>> too, but that seems tricky because most queries will use per-partitions
>>>> stats.
>>>
>>> You mean because the quals are pushed down to the scan node.
>>>
>>> Does that indicate a deficiency ?
>>>
>>> If extended stats are collected for a parent table, selectivity estimates 
>>> based
>>> from the parent would be better; but instead we use uncorrected column
>>> estimates from the child tables.
>>>
>>> From what I see, we could come up with a way to avoid the pushdown, 
>>> involving
>>> volatile functions/foreign tables/RLS/window functions/SRF/wholerow 
>>> vars/etc.
>>> But would it be better if extended stats objects on partitioned tables were 
>>> to
>>> collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
>>> wrong solution, but maybe we should still document that extended stats on
>>> (empty) parent tables are often themselves not used/useful for selectivity
>>> estimates, and the user should instead (or in addition) create stats on 
>>> child
>>> tables.
>>>
>>> Or, maybe if there's no extended stats on the child tables, stats on the 
>>> parent
>>> table should be consulted ?
>>
>> Maybe, but that seems like a mostly separate improvement. At this point I'm
>> interested only in testing the behavior implemented in the current patches.
> 
> I don't want to change the scope of the patch, or this thread, but my point is
> that the behaviour already changed once (the original regression) and now 
> we're
> planning to change it again to fix that, so we ought to decide on the expected
> behavior before writing tests to verify it.
> 

OK. Makes sense.

> I think it may be impossible to use the "dependencies" statistic with 
> inherited
> stats.  Normally the quals would be pushed down to the child tables.  But, if
> they weren't pushed down, they'd be attached to something other than a scan
> node on the parent table, so the stats on that table wouldn't apply (right?).
> 

Yeah, that's probably right. But I'm not 100% sure the whole inheritance
tree can't be treated as a single relation by some queries ...

> Maybe the useless stats types should have been prohibited on partitioned 
> tables
> since v10.  It's too late to change that, but perhaps now they shouldn't even
> be collected during analyze.  The dependencies and MCV paths are never called
> with rte->inh==true, so maybe we should Assert(!inh), or add a comment to that
> effect.  Or the regression tests should "memorialize" the behavior.  I'm still
> thinking about it.
> 

Yeah, we can't really prohibit them in backbranches - that'd mean some
CREATE STATISTICS commands suddenly start failing, which would be quite
annoying. Not building them for partitioned tables seems like a better
option - BuildRelationExtStatistics can check relkind and pick what to
ignore. But I'd choose to do that in a separate patch, probably - after
all, it shouldn't really change the behavior of any tests/queries, no?

This reminds me we need to consider if these patches could cause any
issues. The way I see it is this:

1) If the table is a separate relation (not part of an inheritance
tree), this should make no difference. -> OK

2) If the table is using "old" inheritance, this reverts back to
pre-regression behavior. So people will keep using the old statistics
until the ANALYZE, and we need to tell them to ANALYZE or something.

3) If the table is using partitioning, it's guaranteed to be empty and
there are no stats at all. Again, we should tell people to run ANALYZE.

Of course, we can't be sure query plans will change in the right
direction :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-12-13 Thread Tomas Vondra
if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10895fb287..1010d5caa8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3913,6 +3913,14 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
 	Oid			statOid = InvalidOid;
 	MVNDistinct *stats;
 	StatisticExtInfo *matched_info = NULL;
+	RangeTblEntry		*rte = planner_rt_fetch(rel->relid, root);
+
+	/*
+	 * When dealing with inheritance trees, ignore extended stats (which were
+	 * built without data from child rels, and thus do not represent them).
+	 */
+	if (rte->inh)
+		return false;
 
 	/* bail out immediately if the table has no extended statistics */
 	if (!rel->statlist)
@@ -5222,6 +5230,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		foreach(slist, onerel->statlist)
 		{
 			StatisticExtInfo *info = (StatisticExtInfo *) lfirst(slist);
+			RangeTblEntry	 *rte = planner_rt_fetch(onerel->relid, root);
 			ListCell   *expr_item;
 			int			pos;
 
@@ -5232,6 +5241,14 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 			if (vardata->statsTuple)
 break;
 
+			/*
+			 * When dealing with inheritance trees, ignore extended stats (which
+			 * were built without data from child rels, and so do not represent
+			 * them).
+			 */
+			if (rte->inh)
+break;
+
 			/* skip stats without per-expression stats */
 			if (info->kind != STATS_EXT_EXPRESSIONS)
 continue;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index c60ba45aba..5410f58f7f 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -176,6 +176,30 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 NOTICE:  drop cascades to table ab1c
+-- Tests for stats with inheritance
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Ensure non-inherited stats are not applied to inherited query
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+DROP TABLE stxdinh, stxdinh1;
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 -- expression stats may be built on a single expression column
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..a196d5e2f8 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -112,6 +112,21 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 
+-- Tests for stats with inheritance
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Ensure non-inherited stats are not applied to inherited query
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+DROP TABLE stxdinh, stxdinh1;
+
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 
-- 
2.31.1

From 54e04ae2ea9fcb8d83f551c911771fa5018eff14 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 11 Dec 2021 22:38:08 +0100
Subject: [PATCH 2/4] Build inherited extended stats on partitioned tables

Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. That resolved the
issue, but it also means there are no extended stats for partitioned
tables, because the (!inh) case has empty sample. This is a regression,
affecting queries that don't estimate the leaf relations directly.

But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.

Report and patch by Justin Pr

Re: extended stats on partitioned tables

2021-12-13 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 11:23:19PM +0100, Tomas Vondra wrote:
> On 12/12/21 22:32, Justin Pryzby wrote:
> > On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
> > > The one thing bugging me a bit is that the regression test checks only a
> > > GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> > > too, but that seems tricky because most queries will use per-partitions
> > > stats.
> > 
> > You mean because the quals are pushed down to the scan node.
> > 
> > Does that indicate a deficiency ?
> > 
> > If extended stats are collected for a parent table, selectivity estimates 
> > based
> > from the parent would be better; but instead we use uncorrected column
> > estimates from the child tables.
> > 
> > From what I see, we could come up with a way to avoid the pushdown, 
> > involving
> > volatile functions/foreign tables/RLS/window functions/SRF/wholerow 
> > vars/etc.
> > But would it be better if extended stats objects on partitioned tables were 
> > to
> > collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
> > wrong solution, but maybe we should still document that extended stats on
> > (empty) parent tables are often themselves not used/useful for selectivity
> > estimates, and the user should instead (or in addition) create stats on 
> > child
> > tables.
> > 
> > Or, maybe if there's no extended stats on the child tables, stats on the 
> > parent
> > table should be consulted ?
> 
> Maybe, but that seems like a mostly separate improvement. At this point I'm
> interested only in testing the behavior implemented in the current patches.

I don't want to change the scope of the patch, or this thread, but my point is
that the behaviour already changed once (the original regression) and now we're
planning to change it again to fix that, so we ought to decide on the expected
behavior before writing tests to verify it.

I think it may be impossible to use the "dependencies" statistic with inherited
stats.  Normally the quals would be pushed down to the child tables.  But, if
they weren't pushed down, they'd be attached to something other than a scan
node on the parent table, so the stats on that table wouldn't apply (right?).  

Maybe the useless stats types should have been prohibited on partitioned tables
since v10.  It's too late to change that, but perhaps now they shouldn't even
be collected during analyze.  The dependencies and MCV paths are never called
with rte->inh==true, so maybe we should Assert(!inh), or add a comment to that
effect.  Or the regression tests should "memorialize" the behavior.  I'm still
thinking about it.

-- 
Justin




Re: extended stats on partitioned tables

2021-12-13 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 10:29:39PM +0100, Tomas Vondra wrote:
> > In your 0003 patch, the "if inh: break" isn't removed from 
> > examine_variable(),
> > but the corresponding thing is removed everywhere else.
> 
> Ah, you're right. And it wasn't updated in the 0002 patch either - it
> should do the relkind check too, to allow partitioned tables. Fixed.

I think you fixed it in 0002 (thanks) but still wasn't removed from 0003?

In these comments:
+* When dealing with regular inheritance trees, ignore extended stats
+* (which were built without data from child rels, and thus do not
+* represent them). For partitioned tables data from partitions are
+* in the stats (and there's no data in the non-leaf relations), so
+* in this case we do consider extended stats.

I suggest to add a comment after "For partitioned tables".

-- 
Justin




Re: extended stats on partitioned tables

2021-12-12 Thread Tomas Vondra




On 12/12/21 22:32, Justin Pryzby wrote:

On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:

The one thing bugging me a bit is that the regression test checks only a
GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use per-partitions
stats.


You mean because the quals are pushed down to the scan node.

Does that indicate a deficiency ?

If extended stats are collected for a parent table, selectivity estimates based
from the parent would be better; but instead we use uncorrected column
estimates from the child tables.

 From what I see, we could come up with a way to avoid the pushdown, involving
volatile functions/foreign tables/RLS/window functions/SRF/wholerow vars/etc.
 > But would it be better if extended stats objects on partitioned 

tables were to

collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
wrong solution, but maybe we should still document that extended stats on
(empty) parent tables are often themselves not used/useful for selectivity
estimates, and the user should instead (or in addition) create stats on child
tables.

Or, maybe if there's no extended stats on the child tables, stats on the parent
table should be consulted ?



Maybe, but that seems like a mostly separate improvement. At this point 
I'm interested only in testing the behavior implemented in the current 
patches.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-12-12 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 10:29:39PM +0100, Tomas Vondra wrote:
> On 12/12/21 18:52, Justin Pryzby wrote:
> That's mostly a conscious choice, so that I don't have to include
> parsetree.h. But maybe that'd be better ...
> 
> > The regression tests changed as a result of not populating stx_data; I think
> > it's may be better to update like this:
> > 
> > SELECT stxname, stxdndistinct, stxddependencies, stxdmcv, stxoid IS NULL
> >   FROM pg_statistic_ext s LEFT JOIN pg_statistic_ext_data d
> >   ON d.stxoid = s.oid
> >   WHERE s.stxname = 'ab1_a_b_stats';
> > 
> 
> Not sure I understand. Why would this be better than inner join?

It shows that there's an entry in pg_statistic_ext and not one in ext_data,
rather than that it's not in at least one of the catalogs.  Which is nice to
show since as you say it's no longer 1:1.

> Thanks, fixed. Can you read through the commit messages and check the
> attribution is correct for all the patches?

Seems fine.

-- 
Justin




Re: extended stats on partitioned tables

2021-12-12 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
> The one thing bugging me a bit is that the regression test checks only a
> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> too, but that seems tricky because most queries will use per-partitions
> stats.

You mean because the quals are pushed down to the scan node.

Does that indicate a deficiency ?

If extended stats are collected for a parent table, selectivity estimates based
from the parent would be better; but instead we use uncorrected column
estimates from the child tables.

>From what I see, we could come up with a way to avoid the pushdown, involving
volatile functions/foreign tables/RLS/window functions/SRF/wholerow vars/etc.

But would it be better if extended stats objects on partitioned tables were to
collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
wrong solution, but maybe we should still document that extended stats on
(empty) parent tables are often themselves not used/useful for selectivity
estimates, and the user should instead (or in addition) create stats on child
tables.

Or, maybe if there's no extended stats on the child tables, stats on the parent
table should be consulted ?

-- 
Justin




Re: extended stats on partitioned tables

2021-12-12 Thread Tomas Vondra
  |  | 
-(1 row)
+ stxname | stxdndistinct | stxddependencies | stxdmcv 
+-+---+--+-
+(0 rows)
 
 ALTER STATISTICS ab1_a_b_stats SET STATISTICS -1;
 \d+ ab1
@@ -192,11 +191,18 @@ SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
 
 CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
 VACUUM ANALYZE stxdinh, stxdinh1;
--- Since the stats object does not include inherited stats, it should not affect the estimates
+-- See if the extended stats affect the estimates
 SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
  estimated | actual 
 ---+
-  1000 |   1008
+  1008 |   1008
+(1 row)
+
+-- Ensure correct (non-inherited) stats are applied to inherited query
+SELECT * FROM check_estimated_rows('SELECT * FROM ONLY stxdinh GROUP BY 1,2');
+ estimated | actual 
+---+
+ 9 |  9
 (1 row)
 
 DROP TABLE stxdinh, stxdinh1;
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 01383e8f5f..8deb010b3c 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -123,8 +123,10 @@ VACUUM ANALYZE stxdinh, stxdinh1;
 SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
 CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
 VACUUM ANALYZE stxdinh, stxdinh1;
--- Since the stats object does not include inherited stats, it should not affect the estimates
+-- See if the extended stats affect the estimates
 SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+-- Ensure correct (non-inherited) stats are applied to inherited query
+SELECT * FROM check_estimated_rows('SELECT * FROM ONLY stxdinh GROUP BY 1,2');
 DROP TABLE stxdinh, stxdinh1;
 
 -- Ensure inherited stats ARE applied to inherited query in partitioned table
-- 
2.31.1

From 3f3793672978c31efee0588a41d5d5b1a4175539 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 11 Dec 2021 22:38:08 +0100
Subject: [PATCH 2/4] Build inherited extended stats on partitioned tables

Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. That resolved the
issue, but it also means there are no extended stats for partitioned
tables, because the (!inh) case has empty sample. This is a regression,
affecting queries that don't estimate the leaf relations directly.

But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
---
 src/backend/commands/analyze.c  | 18 ++-
 src/backend/statistics/dependencies.c   |  9 +---
 src/backend/statistics/extended_stats.c |  9 +---
 src/backend/utils/adt/selfuncs.c| 30 +++--
 src/test/regress/expected/stats_ext.out | 19 
 src/test/regress/sql/stats_ext.sql  | 10 +
 6 files changed, 76 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index cd77907fc7..58a82e4929 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -549,6 +549,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -612,13 +613,28 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		/*
+		 * Should we build extended statistics for this relation?
+		 *
+		 * The extended statistics catalog does not include an inheritance
+		 * flag, so we can't store statistics built both with and without
+		 * data from child relations. We can store just one set of statistics
+		 * per relation. For plain relations that's fine, but for inheritance
+		 * trees we have to pick whether to store statistics for just the
+		 * one relation or the whole tree. For plain inheritance we store
+		 * the (!inh) version, mostly for backwards compatibility reasons.
+		 * For partitioned tables that's pointless (the non-leaf tables are
+		 * always empty), so we store stats representing the whole tree.
+		 */
+		build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
+
 		/*
 		 * Build extended statistics (if there are any).
 		 *
 		 * For now we only build extended statistics on individual relations,
 		 * not for relations representing inheritance trees.
 		 */
-		if (!inh

Re: extended stats on partitioned tables

2021-12-12 Thread Tom Lane
Tomas Vondra  writes:
> On 12/12/21 16:37, Zhihong Yu wrote:
>> Since the rte (RangeTblEntry*) doesn't seem to be used beyond checking 
>> inh, I think it would be better if the above style of checking is used 
>> throughout the patch (without introducing rte variable).

> It's mostly a matter of personal taste, but I always found this style of 
> condition (i.e. dereferencing a pointer returned by a function) much 
> less readable. It's hard to parse what exactly is happening, what struct 
> type are we dealing with, etc. YMMV but the separate variable makes it 
> much clearer for me. And I'd expect the compilers to produce pretty much 
> the same code too for those cases.

FWIW, I agree.  Also, it's possible that future patches would create a
need to touch the RTE again nearby, in which case having the variable
makes it easier to write non-crummy code for that.

regards, tom lane




Re: extended stats on partitioned tables

2021-12-12 Thread Justin Pryzby
+  * XXX Can't we simply look at rte->inh?
+  */
+ inh = root->append_rel_array == NULL ? false :
+ root->append_rel_array[onerel->relid]->parent_relid != 0;

I think so.  That's what I came up with while trying to figured this out, and
it's no great surprise that it needed to be cleaned up - thanks.

In your 0003 patch, the "if inh: break" isn't removed from examine_variable(),
but the corresponding thing is removed everywhere else.

In 0003, mcv_clauselist_selectivity still uses simple_rte_array rather than
rt_fetch.

The regression tests changed as a result of not populating stx_data; I think
it's may be better to update like this:

SELECT stxname, stxdndistinct, stxddependencies, stxdmcv, stxoid IS NULL
  FROM pg_statistic_ext s LEFT JOIN pg_statistic_ext_data d
  ON d.stxoid = s.oid
  WHERE s.stxname = 'ab1_a_b_stats';

There's this part about documentation for the changes in backbranches:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
> Also, I think in backbranches we should document what's being stored in
> pg_statistic_ext, since it's pretty unintuitive:
>  - noninherted stats (FROM ONLY) for inheritence parents;
>  - inherted stats (FROM *) for partitioned tables;

spellcheck: inheritence should be inheritance.

All for now.  I'm going to update the regression tests for dependencies and the
other code paths.

-- 
Justin




Re: extended stats on partitioned tables

2021-12-12 Thread Tomas Vondra

On 12/12/21 16:37, Zhihong Yu wrote:


Hi,
For patch 1, minor comment:

+           if (planner_rt_fetch(onerel->relid, root)->inh)

Since the rte (RangeTblEntry*) doesn't seem to be used beyond checking 
inh, I think it would be better if the above style of checking is used 
throughout the patch (without introducing rte variable).




It's mostly a matter of personal taste, but I always found this style of 
condition (i.e. dereferencing a pointer returned by a function) much 
less readable. It's hard to parse what exactly is happening, what struct 
type are we dealing with, etc. YMMV but the separate variable makes it 
much clearer for me. And I'd expect the compilers to produce pretty much 
the same code too for those cases.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-12-12 Thread Tomas Vondra

On 12/12/21 14:47, Zhihong Yu wrote:



On Sat, Dec 11, 2021 at 9:14 PM Tomas Vondra 
mailto:tomas.von...@enterprisedb.com>> 
wrote:

>

...

 > +       /* skip statistics with mismatching stxdinherit value */
 > +       if (stat->inherit != rte->inh)
 >
 > Should a log be added for the above case ?
 >

Why should we log this? It's an entirely expected case - there's a
mismatch between inheritance for the relation and statistics, simply
skipping it is the right thing to do.


Hi,
I agree that skipping should be fine (to avoid too much logging).



I'm not sure it's related to the amount of logging, really. It'd be just 
noise without any practical use, even for debugging purposes. If you 
have an inheritance tree, it'll automatically have one set of statistics 
for inh=true and one for inh=false. And this condition will always skip 
one of those, depending on what query is being estimated.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-12-12 Thread Zhihong Yu
On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra 
wrote:

> Hi,
>
> Attached is a rebased and cleaned-up version of these patches, with more
> comments, refactorings etc. Justin and Zhihong, can you take a look?
>
>
> 0001 - Ignore extended statistics for inheritance trees
>
> 0002 - Build inherited extended stats on partitioned tables
>
> Those are mostly just Justin's patches, with more detailed comments and
> updated commit message. I've considered moving the rel->inh check to
> statext_clauselist_selectivity, and then removing the check from
> dependencies and MCV. But I decided no to do that, because someone might
> be calling those functions directly (even if that's very unlikely).
>
> The one thing bugging me a bit is that the regression test checks only a
> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> too, but that seems tricky because most queries will use per-partitions
> stats.
>
>
> 0003 - Add stxdinherit flag to pg_statistic_ext_data
>
> This is the patch for master, allowing to build stats for both inherits
> flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
> how we deal with iterating both flags etc. I've adopted most of the
> Justin's fixup patches, except that in plancat.c I've refactored how we
> load the stats to process keys/expressions just once.
>
> It has the same issue with regression test using just a GROUP BY query,
> but if we add a test to 0001/0002, that'll fix this too.
>
>
> 0004 - Refactor parent ACL check
>
> Not sure about this - I doubt saving 30 rows in an 8kB file is really
> worth it. Maybe it is, but then maybe we should try cleaning up the
> other ACL checks in this file too? Seems mostly orthogonal to this
> thread, though.
>
> Hi,
For patch 1, minor comment:

+   if (planner_rt_fetch(onerel->relid, root)->inh)

Since the rte (RangeTblEntry*) doesn't seem to be used beyond checking inh,
I think it would be better if the above style of checking is used
throughout the patch (without introducing rte variable).

Cheers

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Re: extended stats on partitioned tables

2021-12-12 Thread Zhihong Yu
On Sat, Dec 11, 2021 at 9:14 PM Tomas Vondra 
wrote:

>
>
> On 12/12/21 05:38, Zhihong Yu wrote:
> >
> >
> > On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >
> > Hi,
> >
> > Attached is a rebased and cleaned-up version of these patches, with
> more
> > comments, refactorings etc. Justin and Zhihong, can you take a look?
> >
> >
> > 0001 - Ignore extended statistics for inheritance trees
> >
> > 0002 - Build inherited extended stats on partitioned tables
> >
> > Those are mostly just Justin's patches, with more detailed comments
> and
> > updated commit message. I've considered moving the rel->inh check to
> > statext_clauselist_selectivity, and then removing the check from
> > dependencies and MCV. But I decided no to do that, because someone
> might
> > be calling those functions directly (even if that's very unlikely).
> >
> > The one thing bugging me a bit is that the regression test checks
> only a
> > GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> > too, but that seems tricky because most queries will use
> per-partitions
> > stats.
> >
> >
> > 0003 - Add stxdinherit flag to pg_statistic_ext_data
> >
> > This is the patch for master, allowing to build stats for both
> inherits
> > flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
> > how we deal with iterating both flags etc. I've adopted most of the
> > Justin's fixup patches, except that in plancat.c I've refactored how
> we
> > load the stats to process keys/expressions just once.
> >
> > It has the same issue with regression test using just a GROUP BY
> query,
> > but if we add a test to 0001/0002, that'll fix this too.
> >
> >
> > 0004 - Refactor parent ACL check
> >
> > Not sure about this - I doubt saving 30 rows in an 8kB file is really
> > worth it. Maybe it is, but then maybe we should try cleaning up the
> > other ACL checks in this file too? Seems mostly orthogonal to this
> > thread, though.
> >
> >
> > Hi,
> > For patch 3, in commit message:
> >
> > and there no clear winner. -> and there is no clear winner.
> >
> > and it seem wasteful -> and it seems wasteful
> >
> > The there may be -> There may be
> >
>
> Thanks, will fix.
>
> > +   /* skip statistics with mismatching stxdinherit value */
> > +   if (stat->inherit != rte->inh)
> >
> > Should a log be added for the above case ?
> >
>
> Why should we log this? It's an entirely expected case - there's a
> mismatch between inheritance for the relation and statistics, simply
> skipping it is the right thing to do.
>

Hi,
I agree that skipping should be fine (to avoid too much logging).

Thanks


> > +* Determine if we'redealing with inheritance tree.
> >
> > There should be a space between re and dealing.
> >
>
> Thanks, will fix.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: extended stats on partitioned tables

2021-12-11 Thread Tomas Vondra



On 12/12/21 05:38, Zhihong Yu wrote:
> 
> 
> On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> Hi,
> 
> Attached is a rebased and cleaned-up version of these patches, with more
> comments, refactorings etc. Justin and Zhihong, can you take a look?
> 
> 
> 0001 - Ignore extended statistics for inheritance trees
> 
> 0002 - Build inherited extended stats on partitioned tables
> 
> Those are mostly just Justin's patches, with more detailed comments and
> updated commit message. I've considered moving the rel->inh check to
> statext_clauselist_selectivity, and then removing the check from
> dependencies and MCV. But I decided no to do that, because someone might
> be calling those functions directly (even if that's very unlikely).
> 
> The one thing bugging me a bit is that the regression test checks only a
> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> too, but that seems tricky because most queries will use per-partitions
> stats.
> 
> 
> 0003 - Add stxdinherit flag to pg_statistic_ext_data
> 
> This is the patch for master, allowing to build stats for both inherits
> flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
> how we deal with iterating both flags etc. I've adopted most of the
> Justin's fixup patches, except that in plancat.c I've refactored how we
> load the stats to process keys/expressions just once.
> 
> It has the same issue with regression test using just a GROUP BY query,
> but if we add a test to 0001/0002, that'll fix this too.
> 
> 
> 0004 - Refactor parent ACL check
> 
> Not sure about this - I doubt saving 30 rows in an 8kB file is really
> worth it. Maybe it is, but then maybe we should try cleaning up the
> other ACL checks in this file too? Seems mostly orthogonal to this
> thread, though.
> 
> 
> Hi,
> For patch 3, in commit message:
> 
> and there no clear winner. -> and there is no clear winner. 
> 
> and it seem wasteful -> and it seems wasteful
> 
> The there may be -> There may be
> 

Thanks, will fix.

> +       /* skip statistics with mismatching stxdinherit value */
> +       if (stat->inherit != rte->inh)
> 
> Should a log be added for the above case ?
> 

Why should we log this? It's an entirely expected case - there's a
mismatch between inheritance for the relation and statistics, simply
skipping it is the right thing to do.

> +                    * Determine if we'redealing with inheritance tree.
> 
> There should be a space between re and dealing.
> 

Thanks, will fix.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-12-11 Thread Zhihong Yu
On Sat, Dec 11, 2021 at 8:17 PM Tomas Vondra 
wrote:

> Hi,
>
> Attached is a rebased and cleaned-up version of these patches, with more
> comments, refactorings etc. Justin and Zhihong, can you take a look?
>
>
> 0001 - Ignore extended statistics for inheritance trees
>
> 0002 - Build inherited extended stats on partitioned tables
>
> Those are mostly just Justin's patches, with more detailed comments and
> updated commit message. I've considered moving the rel->inh check to
> statext_clauselist_selectivity, and then removing the check from
> dependencies and MCV. But I decided no to do that, because someone might
> be calling those functions directly (even if that's very unlikely).
>
> The one thing bugging me a bit is that the regression test checks only a
> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> too, but that seems tricky because most queries will use per-partitions
> stats.
>
>
> 0003 - Add stxdinherit flag to pg_statistic_ext_data
>
> This is the patch for master, allowing to build stats for both inherits
> flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
> how we deal with iterating both flags etc. I've adopted most of the
> Justin's fixup patches, except that in plancat.c I've refactored how we
> load the stats to process keys/expressions just once.
>
> It has the same issue with regression test using just a GROUP BY query,
> but if we add a test to 0001/0002, that'll fix this too.
>
>
> 0004 - Refactor parent ACL check
>
> Not sure about this - I doubt saving 30 rows in an 8kB file is really
> worth it. Maybe it is, but then maybe we should try cleaning up the
> other ACL checks in this file too? Seems mostly orthogonal to this
> thread, though.
>
>
> Hi,
For patch 3, in commit message:

and there no clear winner. -> and there is no clear winner.

and it seem wasteful -> and it seems wasteful

The there may be -> There may be

+   /* skip statistics with mismatching stxdinherit value */
+   if (stat->inherit != rte->inh)

Should a log be added for the above case ?

+* Determine if we'redealing with inheritance tree.

There should be a space between re and dealing.

Cheers

regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


Re: extended stats on partitioned tables

2021-12-11 Thread Tomas Vondra
Hi,

Attached is a rebased and cleaned-up version of these patches, with more
comments, refactorings etc. Justin and Zhihong, can you take a look?


0001 - Ignore extended statistics for inheritance trees

0002 - Build inherited extended stats on partitioned tables

Those are mostly just Justin's patches, with more detailed comments and
updated commit message. I've considered moving the rel->inh check to
statext_clauselist_selectivity, and then removing the check from
dependencies and MCV. But I decided no to do that, because someone might
be calling those functions directly (even if that's very unlikely).

The one thing bugging me a bit is that the regression test checks only a
GROUP BY query. It'd be nice to add queries testing MCV/dependencies
too, but that seems tricky because most queries will use per-partitions
stats.


0003 - Add stxdinherit flag to pg_statistic_ext_data

This is the patch for master, allowing to build stats for both inherits
flag values. It adds the flag to pg_stats_ext_exprs view to, reworked
how we deal with iterating both flags etc. I've adopted most of the
Justin's fixup patches, except that in plancat.c I've refactored how we
load the stats to process keys/expressions just once.

It has the same issue with regression test using just a GROUP BY query,
but if we add a test to 0001/0002, that'll fix this too.


0004 - Refactor parent ACL check

Not sure about this - I doubt saving 30 rows in an 8kB file is really
worth it. Maybe it is, but then maybe we should try cleaning up the
other ACL checks in this file too? Seems mostly orthogonal to this
thread, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 569099c5d14535a95cc21798cb041cae7eee0e36 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 12 Dec 2021 02:28:41 +0100
Subject: [PATCH 4/4] Refactor parent ACL check

selfuncs.c is 8k lines long, and this makes it 30 LOC shorter.
---
 src/backend/utils/adt/selfuncs.c | 140 ---
 1 file changed, 52 insertions(+), 88 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 886d85b7f3..9f656316bf 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -187,6 +187,8 @@ static char *convert_string_datum(Datum value, Oid typid, Oid collid,
   bool *failure);
 static double convert_timevalue_to_scalar(Datum value, Oid typid,
 		  bool *failure);
+static void recheck_parent_acl(PlannerInfo *root, VariableStatData *vardata,
+Oid relid);
 static void examine_simple_variable(PlannerInfo *root, Var *var,
 	VariableStatData *vardata);
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
@@ -5153,51 +5155,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 	(pg_class_aclcheck(rte->relid, userid,
 	   ACL_SELECT) == ACLCHECK_OK);
 
-/*
- * If the user doesn't have permissions to
- * access an inheritance child relation, check
- * the permissions of the table actually
- * mentioned in the query, since most likely
- * the user does have that permission.  Note
- * that whole-table select privilege on the
- * parent doesn't quite guarantee that the
- * user could read all columns of the child.
- * But in practice it's unlikely that any
- * interesting security violation could result
- * from allowing access to the expression
- * index's stats, so we allow it anyway.  See
- * similar code in examine_simple_variable()
- * for additional comments.
- */
-if (!vardata->acl_ok &&
-	root->append_rel_array != NULL)
-{
-	AppendRelInfo *appinfo;
-	Index		varno = index->rel->relid;
-
-	appinfo = root->append_rel_array[varno];
-	while (appinfo &&
-		   planner_rt_fetch(appinfo->parent_relid,
-			root)->rtekind == RTE_RELATION)
-	{
-		varno = appinfo->parent_relid;
-		appinfo = root->append_rel_array[varno];
-	}
-	if (varno != index->rel->relid)
-	{
-		/* Repeat access check on this rel */
-		rte = planner_rt_fetch(varno, root);
-		Assert(rte->rtekind == RTE_RELATION);
-
-		userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
-
-		vardata->acl_ok =
-			rte->securityQuals == NIL &&
-			(pg_class_aclcheck(rte->relid,
-			   userid,
-			   ACL_SELECT) == ACLCHECK_OK);
-	}
-}
+recheck_parent_acl(root, vardata, index->rel->relid);
 			}
 			else
 			{
@@ -5302,49 +5260,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		(pg_class_aclcheck(rte->relid, userid,
 		   ACL_S

Re: extended stats on partitioned tables

2021-12-03 Thread Zhihong Yu
On Thu, Dec 2, 2021 at 9:24 PM Justin Pryzby  wrote:

> On Thu, Nov 04, 2021 at 12:44:45AM +0100, Tomas Vondra wrote:
> > >> And I'm not sure we do the right thing after removing children, for
> example
> > >> (that should drop the inheritance stats, I guess).
>
> > > Do you mean for inheritance only ?  Or partitions too ?
> > > I think for partitions, the stats should stay.
> > > And for inheritence, they can stay, for consistency with partitions,
> and since
> > > it does no harm.
> >
> > I think the behavior should be the same as for data in pg_statistic,
> > i.e. if we keep/remove those, we should do the same thing for extended
> > statistics.
>
> That works for column stats the way I proposed for extended stats: child
> stats
> are never removed, neither when the only child is dropped, nor when
> re-running
> analyze (that part is actually a bit odd).
>
> Rebased, fixing an intermediate compile error, and typos in the commit
> message.
>
> --
> Justin
>
Hi,

+   if (!HeapTupleIsValid(tup)) /* should not happen */
+   // elog(ERROR, "cache lookup failed for statistics data %u",
statsOid);

You may want to remove commented out code.

+   for (i = 0; i < staForm->stxkeys.dim1; i++)
+   keys = bms_add_member(keys, staForm->stxkeys.values[i]);

Since the above code is in a loop now, should keys be cleared across the
outer loop iterations ?

Cheers


Re: extended stats on partitioned tables

2021-12-02 Thread Justin Pryzby
STICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 NOTICE:  drop cascades to table ab1c
+-- Tests for stats with inheritance
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Ensure non-inherited stats are not applied to inherited query
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+DROP TABLE stxdinh, stxdinh1;
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 -- expression stats may be built on a single expression column
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..a196d5e2f8 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -112,6 +112,21 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 
+-- Tests for stats with inheritance
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Ensure non-inherited stats are not applied to inherited query
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+DROP TABLE stxdinh, stxdinh1;
+
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 
-- 
2.17.0

>From 3462a9cf1f3e67c4c42b235136066e03890570ed Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 25 Sep 2021 23:01:21 +0200
Subject: [PATCH 02/15] Build inherited extended stats on partitioned tables

Since 859b3003de, ext stats on partitioned tables are not built, which is a
regression.  For back branches, pg_statistic_ext cannot support both inherited
(FROM) and non-inherited (FROM ONLY) stats on inheritance heirarchies.
But there's no issue building inherited stats for partitioned tables, which
are empty, so cannot have non-inherited stats.

See also: 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

https://www.postgresql.org/message-id/20210923212624.GI831%40telsasoft.com

Backpatch to v10
---
 src/backend/commands/analyze.c  |  5 -
 src/backend/statistics/dependencies.c   |  2 +-
 src/backend/statistics/extended_stats.c |  2 +-
 src/backend/utils/adt/selfuncs.c|  9 ++---
 src/test/regress/expected/stats_ext.out | 19 +++
 src/test/regress/sql/stats_ext.sql  | 10 ++
 6 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index cd77907fc7..d35325f560 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -549,6 +549,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -612,13 +613,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
+
 		/*
 		 * Build extended statistics (if there are any).
 		 *
 		 * For now we only build extended statistics on individual relations,
 		 * not for relations representing inheritance trees.
 		 */
-		if (!inh)
+		if (build_ext_stats)
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 	   attr_cnt, vacattrstats);
 	}
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 015b9e28f6..36121d5a27 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -1417,7 +1417,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
 	int			unique_exprs_cnt;
 
 	/* If it's an inheritence tree, skip statistics (which do not include child stats) */
-	if (rte->inh)
+	if (rte->inh && rte->relkind 

Re: extended stats on partitioned tables

2021-11-03 Thread Justin Pryzby
TblEntry		*rte = root->simple_rte_array[rel->relid];
+
+	/* If it's an inheritence tree, skip statistics (which do not include child stats) */
+	if (rte->inh)
+		return false;
 
 	/* bail out immediately if the table has no extended statistics */
 	if (!rel->statlist)
@@ -5232,6 +5237,10 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 			if (vardata->statsTuple)
 break;
 
+			/* If it's an inheritence tree, skip statistics (which do not include child stats) */
+			if (planner_rt_fetch(onerel->relid, root)->inh)
+break;
+
 			/* skip stats without per-expression stats */
 			if (info->kind != STATS_EXT_EXPRESSIONS)
 continue;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index c60ba45aba..5c15e44bd6 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -176,6 +176,29 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 NOTICE:  drop cascades to table ab1c
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+DROP TABLE stxdinh, stxdinh1;
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 -- expression stats may be built on a single expression column
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..610f7ed17f 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -112,6 +112,20 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+DROP TABLE stxdinh, stxdinh1;
+
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 
-- 
2.17.0

>From 435b154ca177053f6ddc54f7362f5c6aaa486215 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 25 Sep 2021 23:01:21 +0200
Subject: [PATCH v3 2/6] Build inherited extended stats on partitioned tables

Since 859b3003de, ext stats on partitioned tables are not built, which is a
regression.  For back branches, pg_statistic_ext cannot support both inherited
(FROM) and non-inherited (FROM ONLY) stats on inheritence heirarchies.
But there's no issue building inherited stats for partitioned tables, which
are empty, so cannot have non-inherited stats.

See also: 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

https://www.postgresql.org/message-id/20210923212624.GI831%40telsasoft.com

Backpatch to v10
---
 src/backend/commands/analyze.c  |  5 -
 src/backend/statistics/dependencies.c   |  2 +-
 src/backend/statistics/extended_stats.c |  2 +-
 src/backend/utils/adt/selfuncs.c|  9 ++---
 src/test/regress/expected/stats_ext.out | 19 +++
 src/test/regress/sql/stats_ext.sql  | 10 ++
 6 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 4928702aec..acc994e1e8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -548,6 +548,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -611,13 +612,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		build_ext_stats = (onerel->rd_rel->relkind 

Re: extended stats on partitioned tables

2021-11-03 Thread Tomas Vondra



On 11/4/21 12:19 AM, Justin Pryzby wrote:
> On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote:
>> On 10/8/21 12:45 AM, Justin Pryzby wrote:
>>> On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:
 On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:
>> It seems like your patch should also check "inh" in examine_variable and
>> statext_expressions_load.
>
> I tried adding that - I mostly kept my patches separate.
> Hopefully this is more helpful than a complication.
> I added at: https://commitfest.postgresql.org/35/3332/
>

 Actually, this is confusing. Which patch is the one we should be
 reviewing?
>>>
>>> It is confusing, but not as much as I first thought.  Please check the 
>>> commit
>>> messages.
>>>
>>> The first two patches are meant to be applied to master *and* backpatched.  
>>> The
>>> first one intends to fixes the bug that non-inherited stats are being used 
>>> for
>>> queries of inheritance trees.  The 2nd one fixes the regression that stats 
>>> are
>>> not collected for inheritence trees of partitioned tables (which is the only
>>> type of stats they could ever possibly have).
>>
>> I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
>> the (!rte->inh) check in the rel->statlist loops. AFAICS both places
>> could do that right at the beginning, because it does not depend on the
>> statistics object at all, just the RelOptInfo.
> 
> I probably did this to make the code change small, to avoid indentin the whole
> block.

But indenting the block is not necessary. It's possible to do something
like this:

if (!rel->inh)
return 1.0;

or whatever is the "default" result for that function.

> 
>>> And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
>>> inherited and non-inherited stats, only in master, since it requires a 
>>> catalog
>>> change.  It's a bit confusing that patch #4 removes most what I added in
>>> patches 1 and 2.  But that's exactly what's needed to collect and apply both
>>> inherited and non-inherited stats: the first two patches avoid applying 
>>> stats
>>> collected with the wrong inheritence.  That's also what's needed for the
>>> patchset to follow the normal "apply to master and backpatch" process, 
>>> rather
>>> than 2 patches which are backpatched but not applied to master, and one 
>>> which
>>> is applied to master and not backpatched..
>>>
>>
>> Yeah. Af first I was a bit confused because after applying 0003 there
>> are both the fixes and the "correct" way, but then I realized 0004
>> removes the unnecessary bits.
> 
> This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my
> changes.  They should be squished together.
> 

Yep.

>> The one thing 0003 still needs is to rework the places that need to
>> touch both inh and !inh stats. The patch simply does
>>
>> for (inh = 0; inh <= 1; inh++) { ... }
>>
>> but that feels a bit too hackish. But if we don't know which of the two
>> stats exist, I'm not sure what to do about it. 
> 
> There's also this:
> 
> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
>> +   /* create only the "stxdinherit=false", because that always exists */
>> +   datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = 
>> ObjectIdGetDatum(false);
>>
>> That'd be confusing for partitioned tables, no?
>> They'd always have an row with no data.
>> I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == 
>> RELKIND_PARTITIONED_TABLE).
>> (not ObjectIdGetDatum).
>> Then, that affects the loops which delete the tuples - neither inh nor !inh 
>> is
>> guaranteed, unless you check relkind there, too.
> 
> Maybe the for inh<=1 loop should instead be two calls to new functions 
> factored
> out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
> inh".
> 

Well, yeah. That's part of the strange 1:2 mapping between the stats
definition and data. Although, even with regular stats we have such
mapping, except the "definition" is the pg_attribute row.

>> And I'm not sure we do the right thing after removing children, for example
>> (that should drop the inheritance stats, I guess).
> 
> Do you mean for inheritance only ?  Or partitions too ?
> I think for partitions, the stats should stay.
> And for inheritence, they can stay, for consistency with partitions, and since
> it does no harm.
> 

I think the behavior should be the same as for data in pg_statistic,
i.e. if we keep/remove those, we should do the same thing for extended
statistics.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-11-03 Thread Justin Pryzby
On Wed, Nov 03, 2021 at 11:48:44PM +0100, Tomas Vondra wrote:
> On 10/8/21 12:45 AM, Justin Pryzby wrote:
> > On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:
> >> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
> >>> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:
>  It seems like your patch should also check "inh" in examine_variable and
>  statext_expressions_load.
> >>>
> >>> I tried adding that - I mostly kept my patches separate.
> >>> Hopefully this is more helpful than a complication.
> >>> I added at: https://commitfest.postgresql.org/35/3332/
> >>>
> >>
> >> Actually, this is confusing. Which patch is the one we should be
> >> reviewing?
> > 
> > It is confusing, but not as much as I first thought.  Please check the 
> > commit
> > messages.
> > 
> > The first two patches are meant to be applied to master *and* backpatched.  
> > The
> > first one intends to fixes the bug that non-inherited stats are being used 
> > for
> > queries of inheritance trees.  The 2nd one fixes the regression that stats 
> > are
> > not collected for inheritence trees of partitioned tables (which is the only
> > type of stats they could ever possibly have).
> 
> I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
> the (!rte->inh) check in the rel->statlist loops. AFAICS both places
> could do that right at the beginning, because it does not depend on the
> statistics object at all, just the RelOptInfo.

I probably did this to make the code change small, to avoid indentin the whole
block.

> > And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
> > inherited and non-inherited stats, only in master, since it requires a 
> > catalog
> > change.  It's a bit confusing that patch #4 removes most what I added in
> > patches 1 and 2.  But that's exactly what's needed to collect and apply both
> > inherited and non-inherited stats: the first two patches avoid applying 
> > stats
> > collected with the wrong inheritence.  That's also what's needed for the
> > patchset to follow the normal "apply to master and backpatch" process, 
> > rather
> > than 2 patches which are backpatched but not applied to master, and one 
> > which
> > is applied to master and not backpatched..
> > 
> 
> Yeah. Af first I was a bit confused because after applying 0003 there
> are both the fixes and the "correct" way, but then I realized 0004
> removes the unnecessary bits.

This was to leave your 0003 (mostly) unchanged, so you can see and/or apply my
changes.  They should be squished together.

> The one thing 0003 still needs is to rework the places that need to
> touch both inh and !inh stats. The patch simply does
> 
> for (inh = 0; inh <= 1; inh++) { ... }
> 
> but that feels a bit too hackish. But if we don't know which of the two
> stats exist, I'm not sure what to do about it. 

There's also this:

On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
> +   /* create only the "stxdinherit=false", because that always exists */
> +   datavalues[Anum_pg_statistic_ext_data_stxdinherit - 1] = 
> ObjectIdGetDatum(false);
> 
> That'd be confusing for partitioned tables, no?
> They'd always have an row with no data.
> I guess it could be stxdinherit = BoolGetDatum(rel->rd_rel->relkind == 
> RELKIND_PARTITIONED_TABLE).
> (not ObjectIdGetDatum).
> Then, that affects the loops which delete the tuples - neither inh nor !inh is
> guaranteed, unless you check relkind there, too.

Maybe the for inh<=1 loop should instead be two calls to new functions factored
out of get_relation_statistics() and RemoveStatisticsById(), which take "bool
inh".

> And I'm not sure we do the right thing after removing children, for example
> (that should drop the inheritance stats, I guess).

Do you mean for inheritance only ?  Or partitions too ?
I think for partitions, the stats should stay.
And for inheritence, they can stay, for consistency with partitions, and since
it does no harm.




Re: extended stats on partitioned tables

2021-11-03 Thread Tomas Vondra
On 10/8/21 12:45 AM, Justin Pryzby wrote:
> On Thu, Oct 07, 2021 at 03:26:46PM -0500, Jaime Casanova wrote:
>> On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
>>> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:
 It seems like your patch should also check "inh" in examine_variable and
 statext_expressions_load.
>>>
>>> I tried adding that - I mostly kept my patches separate.
>>> Hopefully this is more helpful than a complication.
>>> I added at: https://commitfest.postgresql.org/35/3332/
>>>
>>
>> Actually, this is confusing. Which patch is the one we should be
>> reviewing?
> 
> It is confusing, but not as much as I first thought.  Please check the commit
> messages.
> 
> The first two patches are meant to be applied to master *and* backpatched.  
> The
> first one intends to fixes the bug that non-inherited stats are being used for
> queries of inheritance trees.  The 2nd one fixes the regression that stats are
> not collected for inheritence trees of partitioned tables (which is the only
> type of stats they could ever possibly have).
> 

I think 0001 and 0002 seem mostly fine, but it seems a bit strange to do
the (!rte->inh) check in the rel->statlist loops. AFAICS both places
could do that right at the beginning, because it does not depend on the
statistics object at all, just the RelOptInfo.

> And the 3rd+4th patches (Tomas' plus my changes) allow collecting both
> inherited and non-inherited stats, only in master, since it requires a catalog
> change.  It's a bit confusing that patch #4 removes most what I added in
> patches 1 and 2.  But that's exactly what's needed to collect and apply both
> inherited and non-inherited stats: the first two patches avoid applying stats
> collected with the wrong inheritence.  That's also what's needed for the
> patchset to follow the normal "apply to master and backpatch" process, rather
> than 2 patches which are backpatched but not applied to master, and one which
> is applied to master and not backpatched..
> 

Yeah. Af first I was a bit confused because after applying 0003 there
are both the fixes and the "correct" way, but then I realized 0004
removes the unnecessary bits.

The one thing 0003 still needs is to rework the places that need to
touch both inh and !inh stats. The patch simply does

for (inh = 0; inh <= 1; inh++) { ... }

but that feels a bit too hackish. But if we don't know which of the two
stats exist, I'm not sure what to do about it. And I'm not sure we do
the right thing after removing children, for example (that should drop
the inheritance stats, I guess).

The 1:2 mapping between pg_statistic_ext and pg_statistic_ext_data is a
bit strange, but I can't think of a better way.


> @Tomas: I just found commit 427c6b5b9, which is a remarkably similar issue
> affecting column stats 15 years ago.
> 

What can I say? The history repeats itself ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-10-07 Thread Justin Pryzby
e tree, skip statistics (which do not include child stats) */
+	if (rte->inh)
+		return false;
 
 	/* bail out immediately if the table has no extended statistics */
 	if (!rel->statlist)
@@ -5232,6 +5237,10 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 			if (vardata->statsTuple)
 break;
 
+			/* If it's an inheritence tree, skip statistics (which do not include child stats) */
+			if (planner_rt_fetch(onerel->relid, root)->inh)
+break;
+
 			/* skip stats without per-expression stats */
 			if (info->kind != STATS_EXT_EXPRESSIONS)
 continue;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index c60ba45aba..5c15e44bd6 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -176,6 +176,29 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 NOTICE:  drop cascades to table ab1c
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+DROP TABLE stxdinh, stxdinh1;
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 -- expression stats may be built on a single expression column
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..610f7ed17f 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -112,6 +112,20 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+DROP TABLE stxdinh, stxdinh1;
+
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 
-- 
2.17.0

>From b5bcff5331d052ea753d25ec7f443dc1d807fb13 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 25 Sep 2021 23:01:21 +0200
Subject: [PATCH v2 2/5] Build inherited extended stats on partitioned tables

Since 859b3003de, ext stats on partitioned tables are not built, which is a
regression.  For back branches, pg_statistic_ext cannot support both inherited
(FROM) and non-inherited (FROM ONLY) stats on inheritence heirarchies.
But there's no issue building inherited stats for partitioned tables, which
are empty, so cannot have non-inherited stats.

See also: 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

https://www.postgresql.org/message-id/20210923212624.GI831%40telsasoft.com

Backpatch to v10
---
 src/backend/commands/analyze.c  |  5 -
 src/backend/statistics/dependencies.c   |  2 +-
 src/backend/statistics/extended_stats.c |  2 +-
 src/backend/utils/adt/selfuncs.c|  9 ++---
 src/test/regress/expected/stats_ext.out | 19 +++
 src/test/regress/sql/stats_ext.sql  | 10 ++
 6 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8bfb2ad958..299f4893b8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -548,6 +548,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -611,13 +612,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
+
 		/*
 		 * Build extended statistic

Re: extended stats on partitioned tables

2021-10-07 Thread Jaime Casanova
On Sun, Sep 26, 2021 at 03:25:50PM -0500, Justin Pryzby wrote:
> On Sat, Sep 25, 2021 at 05:31:52PM -0500, Justin Pryzby wrote:
> > It seems like your patch should also check "inh" in examine_variable and
> > statext_expressions_load.
> 
> I tried adding that - I mostly kept my patches separate.
> Hopefully this is more helpful than a complication.
> I added at: https://commitfest.postgresql.org/35/3332/
> 

Actually, this is confusing. Which patch is the one we should be
reviewing?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: extended stats on partitioned tables

2021-09-26 Thread Justin Pryzby
;statlist)
@@ -5232,6 +5237,10 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 			if (vardata->statsTuple)
 break;
 
+			/* If it's an inheritence tree, skip statistics (which do not include child stats) */
+			if (planner_rt_fetch(onerel->relid, root)->inh)
+break;
+
 			/* skip stats without per-expression stats */
 			if (info->kind != STATS_EXT_EXPRESSIONS)
 continue;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index c60ba45aba..5c15e44bd6 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -176,6 +176,29 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 NOTICE:  drop cascades to table ab1c
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+ estimated | actual 
+---+
+  1000 |   1008
+(1 row)
+
+DROP TABLE stxdinh, stxdinh1;
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 -- expression stats may be built on a single expression column
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 6fb37962a7..610f7ed17f 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -112,6 +112,20 @@ CREATE STATISTICS ab1_a_b_stats ON a, b FROM ab1;
 ANALYZE ab1;
 DROP TABLE ab1 CASCADE;
 
+-- Ensure non-inherited stats are not applied to inherited query
+CREATE TABLE stxdinh(i int, j int);
+CREATE TABLE stxdinh1() INHERITS(stxdinh);
+INSERT INTO stxdinh SELECT a, a/10 FROM generate_series(1,9)a;
+INSERT INTO stxdinh1 SELECT a, a FROM generate_series(1,999)a;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Without stats object, it looks like this
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+CREATE STATISTICS stxdinh ON i,j FROM stxdinh;
+VACUUM ANALYZE stxdinh, stxdinh1;
+-- Since the stats object does not include inherited stats, it should not affect the estimates
+SELECT * FROM check_estimated_rows('SELECT * FROM stxdinh* GROUP BY 1,2');
+DROP TABLE stxdinh, stxdinh1;
+
 -- basic test for statistics on expressions
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c TIMESTAMP, d TIMESTAMPTZ);
 
-- 
2.17.0

>From 1c35e88903222e8f1624babd3900a499fdfee2f2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 25 Sep 2021 23:01:21 +0200
Subject: [PATCH 2/5] Build inherited extended stats on partitioned tables

Since 859b3003de, ext stats on partitioned tables are not built, which is a
regression.  For back branches, pg_statistic_ext cannot support both inherited
(FROM) and non-inherited (FROM ONLY) stats on inheritence heirarchies.
But there's no issue building inherited stats for partitioned tables, which
are empty, so cannot have non-inherited stats.

See also: 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

https://www.postgresql.org/message-id/20210923212624.GI831%40telsasoft.com

Backpatch to v10
---
 src/backend/commands/analyze.c  |  5 -
 src/backend/statistics/dependencies.c   |  2 +-
 src/backend/statistics/extended_stats.c |  2 +-
 src/backend/utils/adt/selfuncs.c|  9 ++---
 src/test/regress/expected/stats_ext.out | 19 +++
 src/test/regress/sql/stats_ext.sql  | 10 ++
 6 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8bfb2ad958..299f4893b8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -548,6 +548,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -611,13 +612,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
+
 		/*
 		 * Build extended statistics (if there are any).
 		 *
 		 * For now we only build extended statistics on individual relations,
 		 * not for relations representing inheritance trees.
 		 */
-		if (!inh)
+		if (build_ext_sta

Re: extended stats on partitioned tables

2021-09-26 Thread Tomas Vondra
On 9/25/21 11:46 PM, Justin Pryzby wrote:
> On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:
>>> Do you think it's possible to backpatch a fix to handle partitioned tables
>>> specifically ?
>>>
>>> The "tuple already updated" error which I reported and which was fixed by
>>> 859b3003 involved inheritence children.  Since partitioned tables have no 
>>> data
>>> themselves, the !inh check could be relaxed.  It's not totally clear to me 
>>> if
>>> the correct statistics would be used in that case.  I suppose the wrong
>>> (inherited) stats would be wrongly applied affect queries FROM ONLY a
>>> partitioned table, which seems pointless to write and also hard for the
>>> estimates to be far off :)
>>
>> Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
>> never build stats with and without inheritance at the same time (for the
>> same rel). The 859b3003de ensures that by only building extended stats in
>> the (!inh) case, but we might tweak that based on relkind. See the attached
>> patch. But I wonder if there are cases that might be hurt by this - that'd
>> be a regression too, of course.
> 
> I think we should leave the inheritance case alone, since it hasn't changed in
> 2 years, and building stats on the table ONLY is a legitimate interpretation,
> and it's as good as we can do without the catalog change.
> 
> But the partitioned case used to work, and there's no utility in selecting 
> FROM
> ONLY a partitioned table, so we might as well build the stats including its
> partitions.
> 
> I don't think anything would get worse for the partitioned case.
> Obviously building inherited ext stats could change plans - that's the point.
> It's weird that the stats objects which existed for 18 months before being
> "built" after the patch was applied, but no so weird that the release notes
> wouldn't be ample documentation.
> 

Agreed.

> If building statistics caused the plan to change undesirably, the solution
> would be to drop the stats object, of course.
> 
> +   build_ext_stats = (onerel->rd_rel->relkind == 
> RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
>   
> 
> 
> It's weird to build inherited extended stats for partitioned tables but not 
> for
> inheritence parents.  We could be clever and say "build inherited ext stats 
> for
> inheritence parents only if we didn't insert any stats for the table itself
> (because it's empty)".  But I think that's fragile: a single tuple in the
> parent table could cause stats to be built there instead of on its heirarchy,
> and the extended stats would be used for *both* FROM and FROM ONLY, which is 
> an
> awful combination.
> 

I don't think there's a good way to check if there are any rows in the
parent relation. And even then, a single row might cause huge changes to
query plans (essentially switching to very different stats).

> Since do_analyze_rel is only called once for partitioned tables, I think you
> could write that as:
> 
> /* Do not build inherited stats (since the catalog cannot support it) except
>  * for partitioned tables, for which numrows==0 and have no non-inherited 
> stats */
> build_ext_stats = !inh || onerel->rd_rel->relkind == 
> RELKIND_PARTITIONED_TABLE;
> 

Good point.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: extended stats on partitioned tables

2021-09-25 Thread Justin Pryzby
(Resending with -hackers)

It seems like your patch should also check "inh" in examine_variable and
statext_expressions_load.

Which leads to another issue in stable branches:

ANALYZE builds only non-inherited stats, but they're incorrectly used for
inherited queries - the rowcount estimate is worse on inheritence parents with
extended stats than without.

 CREATE TABLE p(i int, j int);
 CREATE TABLE p1() INHERITS(p);
 INSERT INTO p SELECT a, a/10 FROM generate_series(1,9)a;
 INSERT INTO p1 SELECT a, a FROM generate_series(1,999)a;
 CREATE STATISTICS ps ON i,j FROM p;
 VACUUM ANALYZE p,p1;

postgres=# explain analyze SELECT * FROM p GROUP BY 1,2;
 HashAggregate  (cost=26.16..26.25 rows=9 width=8) (actual time=2.571..3.282 
rows=1008 loops=1)

postgres=# begin; DROP STATISTICS ps; explain analyze SELECT * FROM p GROUP BY 
1,2; rollback;
 HashAggregate  (cost=26.16..36.16 rows=1000 width=8) (actual time=2.167..2.872 
rows=1008 loops=1)

I guess examine_variable() should have corresponding logic to the hardcoded
!inh in analyze.c.

-- 
Justin




Re: extended stats on partitioned tables

2021-09-25 Thread Justin Pryzby
On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:
> > Do you think it's possible to backpatch a fix to handle partitioned tables
> > specifically ?
> > 
> > The "tuple already updated" error which I reported and which was fixed by
> > 859b3003 involved inheritence children.  Since partitioned tables have no 
> > data
> > themselves, the !inh check could be relaxed.  It's not totally clear to me 
> > if
> > the correct statistics would be used in that case.  I suppose the wrong
> > (inherited) stats would be wrongly applied affect queries FROM ONLY a
> > partitioned table, which seems pointless to write and also hard for the
> > estimates to be far off :)
> 
> Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
> never build stats with and without inheritance at the same time (for the
> same rel). The 859b3003de ensures that by only building extended stats in
> the (!inh) case, but we might tweak that based on relkind. See the attached
> patch. But I wonder if there are cases that might be hurt by this - that'd
> be a regression too, of course.

I think we should leave the inheritance case alone, since it hasn't changed in
2 years, and building stats on the table ONLY is a legitimate interpretation,
and it's as good as we can do without the catalog change.

But the partitioned case used to work, and there's no utility in selecting FROM
ONLY a partitioned table, so we might as well build the stats including its
partitions.

I don't think anything would get worse for the partitioned case.
Obviously building inherited ext stats could change plans - that's the point.
It's weird that the stats objects which existed for 18 months before being
"built" after the patch was applied, but no so weird that the release notes
wouldn't be ample documentation.

If building statistics caused the plan to change undesirably, the solution
would be to drop the stats object, of course.

+   build_ext_stats = (onerel->rd_rel->relkind == 
RELKIND_PARTITIONED_TABLE) ? inh : (!inh);      
        

It's weird to build inherited extended stats for partitioned tables but not for
inheritence parents.  We could be clever and say "build inherited ext stats for
inheritence parents only if we didn't insert any stats for the table itself
(because it's empty)".  But I think that's fragile: a single tuple in the
parent table could cause stats to be built there instead of on its heirarchy,
and the extended stats would be used for *both* FROM and FROM ONLY, which is an
awful combination.

Since do_analyze_rel is only called once for partitioned tables, I think you
could write that as:

/* Do not build inherited stats (since the catalog cannot support it) except
 * for partitioned tables, for which numrows==0 and have no non-inherited stats 
*/
build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;

-- 
Justin




Re: extended stats on partitioned tables

2021-09-25 Thread Tomas Vondra



On 9/25/21 9:53 PM, Justin Pryzby wrote:

On Sat, Sep 25, 2021 at 09:27:10PM +0200, Tomas Vondra wrote:

On 9/23/21 11:26 PM, Justin Pryzby wrote:

extended stats objects are allowed on partitioned tables since v10.
https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
|Don't build extended statistics on inheritance trees
...
|Moreover, the current selectivity estimation code only works with 
individual
|relations, so building statistics on inheritance trees would be pointless
|anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP 
BY 1,2; abort;
| HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 
rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 
rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

...

Agreed, that seems like a regression, but I don't see how to fix that
without having the extra flag in the catalog. Otherwise we can store just
one version for each statistics object :-(


Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children.  Since partitioned tables have no data
themselves, the !inh check could be relaxed.  It's not totally clear to me if
the correct statistics would be used in that case.  I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)



Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure 
we never build stats with and without inheritance at the same time (for 
the same rel). The 859b3003de ensures that by only building extended 
stats in the (!inh) case, but we might tweak that based on relkind. See 
the attached patch. But I wonder if there are cases that might be hurt 
by this - that'd be a regression too, of course.



Attached is a PoC that I quickly bashed together today. It's pretty raw, but
it passed "make check" and I think it does most of the things right. Can you
try if this fixes the estimates with partitioned tables?


I think pg_stats_ext_exprs also needs to expose the inherited flag.



Yeah, I only did the bare minimum to get the PoC working. I'm sure there 
are various other loose ends.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8bfb2ad958..299f4893b8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -548,6 +548,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	{
 		MemoryContext col_context,
 	old_context;
+		bool		build_ext_stats;
 
 		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
 	 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
@@ -611,13 +612,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 			thisdata->attr_cnt, thisdata->vacattrstats);
 		}
 
+		build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
+
 		/*
 		 * Build extended statistics (if there are any).
 		 *
 		 * For now we only build extended statistics on individual relations,
 		 * not for relations representing inheritance trees.
 		 */
-		if (!inh)
+		if (build_ext_stats)
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 	   attr_cnt, vacattrstats);
 	}


Re: extended stats on partitioned tables

2021-09-25 Thread Justin Pryzby
On Sat, Sep 25, 2021 at 09:27:10PM +0200, Tomas Vondra wrote:
> On 9/23/21 11:26 PM, Justin Pryzby wrote:
> > extended stats objects are allowed on partitioned tables since v10.
> > https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
> > 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b
> > 
> > But since 859b3003de they're not populated - pg_statistic_ext(_data) is 
> > empty.
> > This was the consequence of a commit to avoid an error I reported with 
> > stats on
> > inheritence parents (not partitioned tables).
> > 
> > preceding 859b3003de, stats on the parent table *did* improve the estimate,
> > so this part of the commit message seems to have been wrong?
> > |commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
> > |Don't build extended statistics on inheritance trees
> > ...
> > |Moreover, the current selectivity estimation code only works with 
> > individual
> > |relations, so building statistics on inheritance trees would be 
> > pointless
> > |anyway.
> > 
> > |CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
> > |CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
> > |TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM 
> > generate_series(1,999)a;
> > |CREATE STATISTICS pp ON (a),(b) FROM p;
> > |VACUUM ANALYZE p;
> > |SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;
> > 
> > |postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p 
> > GROUP BY 1,2; abort;
> > | HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual 
> > time=1.088..1.093 rows=10 loops=1)
> > 
> > |postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
> > | HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual 
> > time=1.082..1.086 rows=10 loops=1)
> > 
> > So I think this is a regression, and extended stats should be populated for
> > partitioned tables - I had actually done that for some parent tables and 
> > hadn't
> > noticed that the stats objects no longer do anything.
...
> Agreed, that seems like a regression, but I don't see how to fix that
> without having the extra flag in the catalog. Otherwise we can store just
> one version for each statistics object :-(

Do you think it's possible to backpatch a fix to handle partitioned tables
specifically ?

The "tuple already updated" error which I reported and which was fixed by
859b3003 involved inheritence children.  Since partitioned tables have no data
themselves, the !inh check could be relaxed.  It's not totally clear to me if
the correct statistics would be used in that case.  I suppose the wrong
(inherited) stats would be wrongly applied affect queries FROM ONLY a
partitioned table, which seems pointless to write and also hard for the
estimates to be far off :)

> Attached is a PoC that I quickly bashed together today. It's pretty raw, but
> it passed "make check" and I think it does most of the things right. Can you
> try if this fixes the estimates with partitioned tables?

I think pg_stats_ext_exprs also needs to expose the inherited flag.

Thanks,
-- 
Justin




Re: extended stats on partitioned tables

2021-09-25 Thread Tomas Vondra

On 9/23/21 11:26 PM, Justin Pryzby wrote:

extended stats objects are allowed on partitioned tables since v10.
https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
|Don't build extended statistics on inheritance trees
...
|Moreover, the current selectivity estimation code only works with 
individual
|relations, so building statistics on inheritance trees would be pointless
|anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP 
BY 1,2; abort;
| HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 
rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 
rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate  (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 
rows=10 loops=1)



Agreed, that seems like a regression, but I don't see how to fix that 
without having the extra flag in the catalog. Otherwise we can store 
just one version for each statistics object :-(



Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself.  I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again).  Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.



Yeah, we probably need the flag - I planned to get it into 14, but then 
I got distracted by something else :-/


Attached is a PoC that I quickly bashed together today. It's pretty raw, 
but it passed "make check" and I think it does most of the things right. 
Can you try if this fixes the estimates with partitioned tables?


Extended statistics use two catalogs, pg_statistic_ext for definition, 
while pg_statistic_ext_data stores the built statistics objects - the 
flag needs to be in the "data" catalog, and managing the records is a 
bit challenging - the current PoC code mostly works, but I had to relax 
some error checks and I'm sure there are cases when we fail to remove a 
row, or something like that.



Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).



Yeah. I think it'd be helpful to assemble a script with various test 
cases demonstrating how we estimate various cases.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2f0def9b19..e256a5533c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7441,6 +7441,19 @@ SCRAM-SHA-256$iteration count:
created with CREATE STATISTICS.
   
 
+  
+   Normally there is one entry, with stxdinherit =
+   false, for each statistics object that has been analyzed.
+   If the table has inheritance children, a second entry with
+   stxdinherit = true is also created.
+   This row represents the statistics object over the inheritance tree, i.e.,
+   statistics for the data you'd see with
+   SELECT * FROM table*,
+   whereas the stxdinherit = false row
+   represents the results of
+   SELECT * FROM ONLY table.
+  
+
   
Like pg_statistic,
pg_statistic_ext_data should not be
@@ -7480,6 +7493,16 @@ SCRAM-SHA-256$iteration count:
   
  
 
+ 
+  
+   stxdinherit bool
+  
+  
+   If true, the stats include inheritance child columns, not just the
+   values in the 

extended stats on partitioned tables

2021-09-23 Thread Justin Pryzby
extended stats objects are allowed on partitioned tables since v10.
https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
|Don't build extended statistics on inheritance trees
...
|Moreover, the current selectivity estimation code only works with 
individual
|relations, so building statistics on inheritance trees would be pointless
|anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP 
BY 1,2; abort;
| HashAggregate  (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 
rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate  (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 
rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate  (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 
rows=10 loops=1)

Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself.  I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again).  Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.

Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).