Hi Horiguchi-san!

On 2019/07/11 19:56, Kyotaro Horiguchi wrote:
Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada...@nttcom.co.jp> 
wrote in <244cb241-168b-d6a9-c45f-a80c34cdc...@nttcom.co.jp>
Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:
On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmh...@gmail.com>
wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the
end.
+1
Note that this patch is already behaving like that if the table only
contains dead rows.

+1 from me.

3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------

I have some comments.

+               const int   index[] = {
+                       PROGRESS_ANALYZE_PHASE,
+                       PROGRESS_ANALYZE_TOTAL_BLOCKS,
+                       PROGRESS_ANALYZE_BLOCKS_DONE
+               };
+               const int64 val[] = {
+                       PROGRESS_ANALYZE_PHASE_ANALYSIS,
+                       0, 0

Do you oppose to leaving the total/done blocks alone here:p?


Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".



+  BlockNumber  nblocks;
+  double    blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);

It is converted into int64.


Fixed.
But is it suitable to use BlockNumber instead int64?


+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'



+                      WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'


I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

  WHEN 2 THEN 'computing stats'
  WHEN 3 THEN 'computing extended stats'
  WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.



+     <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)


I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)



+      <entry>Whether the current scan includes legacy inheritance 
children.</entry>

This apparently excludes partition tables but actually it is
included.

   "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..


Hmm... I'm also not sure but I fixed as you suggested.



+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

   OID of the table currently being scanned. Can differ from relid
   when analyzing tables that have child tables.


How about:
  OID of the table currently being scanned.
  It might be different from relid when analyzing tables that have child tables.



+       Total number of heap blocks to scan in the current table.

    Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".


How about:
  Total number of heap blocks in the scanning_table.




+       The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.


Fixed.

+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.


Fixed.


Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================


Thanks,
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..bf09d69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   
11:34   0:00 postgres: ser
      </row>
 
      <row>
+      
<entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) 
running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
+     <row>
       
<entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
        <command>CLUSTER</command> or <command>VACUUM FULL</command>, showing 
current progress.
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the 
progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,131 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" 
xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>scanning_table</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child 
tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" 
/></entry>
+     </row>
+     <row>
+     <entry><structfield>heap_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks in the scanning_table.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>scanning table</literal></entry>
+     <entry>
+       The command is currently scanning the 
<structfield>scanning_table</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained in the 
previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in 
the previous phase.
+     </entry>
+    </row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index ea4c85e..ce3416e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS scanning_table,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'scanning table'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8d633f2..19c7042 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
        LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
        MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
        LWLockRelease(ProcArrayLock);
+       pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+                                                                 
RelationGetRelid(onerel));
 
        /*
         * Do the normal non-recursive ANALYZE.  We can skip this for 
partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
         */
        relation_close(onerel, NoLock);
 
+       pgstat_progress_end_command();
+
        /*
         * Reset my PGXACT flag.  Note: we need this here, and not in 
vacuum_rel,
         * because the vacuum flag is cleared by the end-of-xact code.
@@ -494,6 +499,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
        /*
         * Acquire the sample rows
         */
+       {
+               const int   index[] = {
+                       PROGRESS_ANALYZE_PHASE,
+                       PROGRESS_ANALYZE_INH
+               };
+               const int64 val[] = {
+                       PROGRESS_ANALYZE_PHASE_SCAN_TABLE,
+                       inh
+               };
+
+               pgstat_progress_update_multi_param(2, index, val);
+       }
        rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
        if (inh)
                numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -513,7 +530,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
        if (numrows > 0)
        {
                MemoryContext col_context,
-                                       old_context;
+                                         old_context;
+
+               pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                                                        
PROGRESS_ANALYZE_PHASE_COMPUTING);
 
                col_context = AllocSetContextCreate(anl_context,
                                                                                
        "Analyze Column",
@@ -575,10 +595,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
                }
 
                /* Build extended statistics (if there are any). */
+               pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                                                        
PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED);
                BuildRelationExtStatistics(onerel, totalrows, numrows, rows, 
attr_cnt,
                                                                   
vacattrstats);
        }
 
+       pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                                                
PROGRESS_ANALYZE_PHASE_FINALIZE);
+
        /*
         * Update pages/tuples stats in pg_class ... but not if we're doing
         * inherited stats.
@@ -1017,6 +1042,8 @@ acquire_sample_rows(Relation onerel, int elevel,
        ReservoirStateData rstate;
        TupleTableSlot *slot;
        TableScanDesc scan;
+       BlockNumber     nblocks;
+       int64           blksdone = 0;
 
        Assert(targrows > 0);
 
@@ -1026,7 +1053,20 @@ acquire_sample_rows(Relation onerel, int elevel,
        OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
        /* Prepare for sampling block numbers */
-       BlockSampler_Init(&bs, totalblocks, targrows, random());
+       nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+       {
+               const int   index[] = {
+                       PROGRESS_ANALYZE_TOTAL_BLOCKS,
+                       PROGRESS_ANALYZE_SCANREL
+               };
+               const int64 val[] = {
+                       nblocks,
+                       RelationGetRelid(onerel)
+               };
+
+               pgstat_progress_update_multi_param(2, index, val);
+       }
+
        /* Prepare for sampling rows */
        reservoir_init_selection_state(&rstate, targrows);
 
@@ -1087,6 +1127,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
                        samplerows += 1;
                }
+
+               pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                                                                        
++blksdone);
        }
 
        ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..db2cc5c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
        /* Translate command name into command type code. */
        if (pg_strcasecmp(cmd, "VACUUM") == 0)
                cmdtype = PROGRESS_COMMAND_VACUUM;
+       else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+               cmdtype = PROGRESS_COMMAND_ANALYZE;
        else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
                cmdtype = PROGRESS_COMMAND_CLUSTER;
        else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c 
b/src/backend/utils/misc/sampling.c
index d2a1537..f7daece 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
                                  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int 
samplesize,
        bs->m = 0;                                      /* blocks selected so 
far */
 
        sampler_random_init_state(randseed, bs->randstate);
+
+       return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313..300bb58 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE                 5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP            6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE                                 0
+#define PROGRESS_ANALYZE_INH                                   1
+#define PROGRESS_ANALYZE_SCANREL                               2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS                  3
+#define PROGRESS_ANALYZE_BLOCKS_DONE                   4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE              1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING                       2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE                        4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND                               0
 #define PROGRESS_CLUSTER_PHASE                                 1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..9344654 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
        PROGRESS_COMMAND_INVALID,
        PROGRESS_COMMAND_VACUUM,
+       PROGRESS_COMMAND_ANALYZE,
        PROGRESS_COMMAND_CLUSTER,
        PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507..76d31dc 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
                                                          int samplesize, long 
randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out 
b/src/test/regress/expected/rules.out
index 210e9cd..691f46a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, 
application_name, state, query, wait_event_type, wait_event, xact_start, 
query_start, backend_start, state_change, client_addr, client_hostname, 
client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, 
sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, 
ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS scanning_table,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS heap_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, 
param1, param2, param3, param4, param5, param6, param7, param8, param9, 
param10, param11, param12, param13, param14, param15, param16, param17, 
param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

Reply via email to