Hi,

On 10/8/21 14:58, Justin Pryzby wrote:
Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should
revert relevant portions of this patch.

I went through this patch and I'd like to propose a couple changes, per the 0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit strange before, but I'm not a native speaker so maybe it's worse ...

2) Remove unnecessary whitespace changes in perform.sgml.

3) Simplify the analyze.sgml changes a bit - it was trying to cram too much stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?

FWIW I think it's really confusing we have inheritance and partitioning, and partitions and child tables. And sometimes we use partitioning in the generic sense (i.e. including the inheritance approach), and sometimes only the declarative variant. Same for partitions vs child tables. I can't even imagine how confusing this has to be for people just learning this stuff. They must be in permanent WTF?! state ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From c5105f2f067dff553045792d4a171fc835fe2109 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Thu, 22 Jul 2021 16:06:18 -0500
Subject: [PATCH v2 1/2] documentation deficiencies for ANALYZE of partitioned
 tables

This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.
---
 doc/src/sgml/maintenance.sgml | 27 ++++++++++++++++++++++++++
 doc/src/sgml/perform.sgml     |  2 ++
 doc/src/sgml/ref/analyze.sgml | 36 ++++++++++++++++++++++-------------
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5b..b7c806cc906 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
     to meaningful statistical changes.
    </para>
 
+   <para>
+    Tuples changed in partitions and inheritance children do not count towards
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum.  It is necessary to
+    periodically manual run <command>ANALYZE</command> on the parent table to
+    keep the statistics of its table hierarchy up to date.
+   </para>
+
    <para>
     As with vacuuming for space recovery, frequent updates of statistics
     are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
      <command>ANALYZE</command> commands on those tables on a suitable schedule.
     </para>
    </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+     partitioned tables.  Inheritance parents will only be analyzed if the
+     parent itself is changed - changes to child tables do not trigger
+     autoanalyze on the parent table.  It is necessary to periodically run a
+     manual <command>ANALYZE</command> to keep the statistics of the table
+     hierarchy up to date.
+    </para>
+   </tip>
+
   </sect2>
 
   <sect2 id="vacuum-for-visibility-map">
@@ -819,6 +839,13 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
     since the last <command>ANALYZE</command>.
    </para>
 
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command> when it is
+    first populated, and updated whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
    <para>
     Temporary tables cannot be accessed by autovacuum.  Therefore,
     appropriate vacuum and analyze operations should be performed via
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e5..b84853fd6ff 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <title>Run <command>ANALYZE</command> Afterwards</title>
 
    <para>
+
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
     includes bulk loading large amounts of data into the table.  Running
+
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc1612..ea78f0d0387 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,32 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
   </para>
 
   <para>
-    If the table being analyzed has one or more children,
-    <command>ANALYZE</command> will gather statistics twice: once on the
-    rows of the parent table only, and a second time on the rows of the
-    parent table with all of its children.  This second set of statistics
-    is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
-    updates on the parent table itself when deciding whether to trigger an
-    automatic analyze for that table.  If that table is rarely inserted into
-    or updated, the inheritance statistics will not be up to date unless you
-    run <command>ANALYZE</command> manually.
+    If the table being analyzed is partitioned, <command>ANALYZE</command>
+    will gather statistics by sampling blocks randomly from its partitions;
+    in addition, it will recurse into each partition and update its statistics.
+    (However, in multi-level partitioning scenarios, each leaf partition
+    will only be analyzed once.)
+    By constrast, if the table being analyzed has inheritance children,
+    <command>ANALYZE</command> will gather statistics for it twice:
+    once on the rows of the parent table only, and a second time on the
+    rows of the parent table with all of its children.  This second set of
+    statistics is needed when planning queries that traverse the entire
+    inheritance tree.  The child tables themselves are not individually
+    analyzed in this case.
   </para>
 
   <para>
-    If any of the child tables are foreign tables whose foreign data wrappers
-    do not support <command>ANALYZE</command>, those child tables are ignored while
-    gathering inheritance statistics.
+    The autovacuum daemon does not process partitioned tables, nor does it
+    process inheritance parents due to modifications to its child tables.
+    It is usually necessary to periodically run a manual
+    <command>ANALYZE</command> to keep the statistics of the table hierarchy
+    up to date.  See...
+  </para>
+
+  <para>
+    If any child tables or partitions are foreign tables whose foreign
+    data wrappers do not support <command>ANALYZE</command>, those tables are
+    ignored while gathering inheritance statistics.
   </para>
 
   <para>
-- 
2.31.1

From 993a42ee2c73a1098845319b30811b5b7497f2f8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.von...@postgresql.org>
Date: Fri, 21 Jan 2022 18:05:56 +0100
Subject: [PATCH v2 2/2] minor changes / rewordings

---
 doc/src/sgml/maintenance.sgml | 20 +++++++++---------
 doc/src/sgml/perform.sgml     |  2 --
 doc/src/sgml/ref/analyze.sgml | 38 +++++++++++++++++++++++++----------
 3 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index b7c806cc906..2086dae61b2 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -291,11 +291,12 @@
    </para>
 
    <para>
-    Tuples changed in partitions and inheritance children do not count towards
+    Tuples changed in partitions and inheritance children do not trigger
     analyze on the parent table.  If the parent table is empty or rarely
-    changed, it may never be processed by autovacuum.  It is necessary to
-    periodically manual run <command>ANALYZE</command> on the parent table to
-    keep the statistics of its table hierarchy up to date.
+    changed, it may never be processed by autovacuum, and the statistics for
+    the inheritance tree as a whole won't be collected. It is necessary to
+    run <command>ANALYZE</command> on the parent table manually, to keep
+    the statistics up to date.
    </para>
 
    <para>
@@ -358,12 +359,13 @@
 
    <tip>
     <para>
-     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
-     partitioned tables.  Inheritance parents will only be analyzed if the
+     The autovacuum daemon may not issue <command>ANALYZE</command> commands
+     for partitioned tables.  Inheritance parents will only be analyzed if the
      parent itself is changed - changes to child tables do not trigger
-     autoanalyze on the parent table.  It is necessary to periodically run a
-     manual <command>ANALYZE</command> to keep the statistics of the table
-     hierarchy up to date.
+     autoanalyze on the parent table.  If your queries require statistics on
+     parent relations for proper planning, it's necessary to periodically run
+     a manual <command>ANALYZE</command> on those tables to keep the statistics
+     up to date.
     </para>
    </tip>
 
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b84853fd6ff..89ff58338e5 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,11 +1765,9 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <title>Run <command>ANALYZE</command> Afterwards</title>
 
    <para>
-
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
     includes bulk loading large amounts of data into the table.  Running
-
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index ea78f0d0387..1482fc2d4bb 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,18 +250,34 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
   </para>
 
   <para>
-    If the table being analyzed is partitioned, <command>ANALYZE</command>
-    will gather statistics by sampling blocks randomly from its partitions;
-    in addition, it will recurse into each partition and update its statistics.
-    (However, in multi-level partitioning scenarios, each leaf partition
-    will only be analyzed once.)
+    If the table being analyzed has one or more children,
+    <command>ANALYZE</command> will gather statistics twice: once on the
+    rows of the parent table only, and a second time on the rows of the
+    parent table with all of its children.  This second set of statistics
+    is needed when planning queries that traverse the entire inheritance
+    tree.  The autovacuum daemon, however, will only consider inserts or
+    updates on the parent table itself when deciding whether to trigger an
+    automatic analyze for that table.  If that table is rarely inserted into
+    or updated, the inheritance statistics will not be up to date unless you
+    run <command>ANALYZE</command> manually.
+  </para>
+
+  <para>
+    For partitioned tables, <command>ANALYZE</command> gathers statistics by
+    sampling rows from all partitions; in addition, it will recurse into each
+    partition and update its statistics.  Each leaf partition is analyzed only
+    once, even with multi-level partitioning.  No statistics are collected for
+    the parent table (ignoring data from partitions), because with partitioning
+    it's guaranteed to be empty.
+  </para>
+
+  <para>
     By constrast, if the table being analyzed has inheritance children,
-    <command>ANALYZE</command> will gather statistics for it twice:
-    once on the rows of the parent table only, and a second time on the
-    rows of the parent table with all of its children.  This second set of
-    statistics is needed when planning queries that traverse the entire
-    inheritance tree.  The child tables themselves are not individually
-    analyzed in this case.
+    <command>ANALYZE</command> gathers two sets of statistics: once on the rows
+    of the parent table only, and a second one including rows of both the parent
+    table and all child relations.  This second set of statistics is needed when
+    planning queries that process the whole inheritance tree at once.  The child
+    tables themselves are not individually analyzed in this case.
   </para>
 
   <para>
-- 
2.31.1

Reply via email to