subject:"Re\: progress report for ANALYZE"

Re: progress report for ANALYZE

2020-01-27 Thread Tatsuro Yamada


Hi,

On 2020/01/24 23:44, Amit Langote wrote:

On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera  wrote:

On 2020-Jan-22, Tatsuro Yamada wrote:

P.S.
Next up is progress reporting for query execution?!


Actually, I think it's ALTER TABLE.


+1.  Existing infrastructure might be enough to cover ALTER TABLE's
needs, whereas we will very likely need to build entirely different
infrastructure for tracking the progress for SQL query execution.


Yeah, I agree.
I will create a little POC patch after reading tablecmds.c and alter.c to
investigate how to report progress. :)

Regards,
Tatsuro Yamada

Re: progress report for ANALYZE

2020-01-24 Thread Amit Langote

On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera  wrote:
> On 2020-Jan-22, Tatsuro Yamada wrote:
> > P.S.
> > Next up is progress reporting for query execution?!
>
> Actually, I think it's ALTER TABLE.

+1.  Existing infrastructure might be enough to cover ALTER TABLE's
needs, whereas we will very likely need to build entirely different
infrastructure for tracking the progress for SQL query execution.

Thanks,
Amit

Re: progress report for ANALYZE

2020-01-23 Thread Alvaro Herrera

On 2020-Jan-22, Tatsuro Yamada wrote:

> Thanks for reviewing and committing the patch!
> Hope this helps DBA. :-D

I'm sure it does!

> P.S.
> Next up is progress reporting for query execution?!

Actually, I think it's ALTER TABLE.

Also, things like VACUUM could report the progress of the index being
processed ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2020-01-23 Thread Michael Paquier

On Wed, Jan 22, 2020 at 03:06:52PM +0900, Amit Langote wrote:
> Oops, really attached this time.

Thanks, applied.  There were clearly two grammar mistakes in the first
patch sent by Justin.  And your suggestions look fine to me.  On top
of that, I have noticed that the indentation of the two tables in the
docs was rather inconsistent.
--
Michael


signature.asc
Description: PGP signature

Re: progress report for ANALYZE

2020-01-21 Thread Amit Langote

On Wed, Jan 22, 2020 at 2:52 PM Amit Langote  wrote:
>
> On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby  wrote:
> >
> > On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> > > I just pushed this after some small extra tweaks.
> > >
> > > Thanks, Yamada-san, for seeing this to completion!
> >
> > Find attached minor fixes to docs - sorry I didn't look earlier.
> >
> > Possibly you'd also want to change the other existing instances of 
> > "preparing
> > to begin".
>
> Spotted a few other issues with the docs:
>
> +   Number of computed extended statistics computed.
>
> Should be: "Number of extended statistics computed."
>
> + OID of the child table currently being scanned. This
> field is only valid when
> +   the phase is computing extended statistics.
>
> Should be: "This field is only valid when the phase is
> acquiring inherited sample rows."
>
> +   durring the table scan.
>
> during
>
> Attached a patch.

Oops, really attached this time.

Thanks,
Amit
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0bfd6151c4..67fbd9660b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3592,7 +3592,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  ext_stats_computed
  bigint
  
-   Number of computed extended statistics computed.  This counter only 
advances when
+   Number of extended statistics computed.  This counter only advances when
the phase is computing extended statistics.
  
 
@@ -3615,7 +3615,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  current_child_table_relid
  oid
  OID of the child table currently being scanned. This field is only 
valid when
-   the phase is computing extended statistics.
+   the phase is acquiring inherited sample rows.
  
 

@@ -3667,7 +3667,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  computing extended statistics
  
The command is computing extended statistics from the samples rows 
obtained
-   durring the table scan.
+   during the table scan.

Re: progress report for ANALYZE

2020-01-21 Thread Amit Langote

On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby  wrote:
>
> On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> > I just pushed this after some small extra tweaks.
> >
> > Thanks, Yamada-san, for seeing this to completion!
>
> Find attached minor fixes to docs - sorry I didn't look earlier.
>
> Possibly you'd also want to change the other existing instances of "preparing
> to begin".

Spotted a few other issues with the docs:

+   Number of computed extended statistics computed.

Should be: "Number of extended statistics computed."

+ OID of the child table currently being scanned. This
field is only valid when
+   the phase is computing extended statistics.

Should be: "This field is only valid when the phase is
acquiring inherited sample rows."

+   durring the table scan.

during

Attached a patch.

Thanks,
Amit

Re: progress report for ANALYZE

2020-01-21 Thread Tatsuro Yamada


Hi Alvaro and All reviewers,

On 2020/01/16 2:11, Alvaro Herrera wrote:

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!


Thanks for reviewing and committing the patch!
Hope this helps DBA. :-D

P.S.
Next up is progress reporting for query execution?!
To create it, I guess that it needs to improve
progress reporting infrastructure.

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2020-01-16 Thread Justin Pryzby

On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:
> I just pushed this after some small extra tweaks.
> 
> Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".
>From de108e69b5d33c881074b0a04697d7061684f823 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 15 Jan 2020 23:10:29 -0600
Subject: [PATCH v1] Doc review for ANALYZE progress (a166d408)

---
 doc/src/sgml/monitoring.sgml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8b44fb1..10871b7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3525,7 +3525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
Whenever ANALYZE is running, the
pg_stat_progress_analyze view will contain a
-   row for each backend that is currently running that command.  The tables
+   row for each backend currently running ANALYZE.  The tables
below describe the information that will be reported and provide
information about how to interpret it.
   
@@ -3635,7 +3635,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  initializing
  
-   The command is preparing to begin scanning the heap.  This phase is
+   The command is preparing to scan the heap.  This phase is
expected to be very brief.
  
 
@@ -3643,7 +3643,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
  acquiring sample rows
  
The command is currently scanning the table given by
-   current_relid to obtain sample rows.
+   relid to obtain sample rows.
  
 
 
@@ -3659,14 +3659,14 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  computing statistics
  
-   The command is computing statistics from the samples rows obtained during
+   The command is computing statistics from the sample rows obtained during
the table scan.
  
 
 
  computing extended statistics
  
-   The command is computing extended statistics from the samples rows obtained
+   The command is computing extended statistics from the sample rows obtained
durring the table scan.
  
 
-- 
2.7.4

Re: progress report for ANALYZE

2020-01-15 Thread Alvaro Herrera

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-12-19 Thread Tatsuro Yamada


Hi All,


*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition.  So, I don't think we need to clarify that if only in
one phase's description.  Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables.  Like this:

   
    
 Note that when ANALYZE is run on a partitioned table,
 all of its partitions are also recursively analyzed as also mentioned on
 .  In that case, ANALYZE
 progress is reported first for the parent table, whereby its inheritance
 statistics are collected, followed by that for each partition.
    
   



Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.



Some more comments on the documentation:

+   Number of computed extended stats.  This counter only advances
when the phase
+   is computing extended stats.

Number of computed extended stats -> Number of extended stats computed



Will fix.



+   Number of analyzed child tables.  This counter only advances
when the phase
+   is computing extended stats.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
meant "when the phase is acquiring inherited sample
rows.  I suggest to write this as follows:

Number of child tables scanned.  This counter only advances when the phase
is acquiring inherited sample rows.



Oops, I will fix it. :)




+ OID of the child table currently being scanned.
+   It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned.  This field is only valid when
the phase is computing extended stats.



Will fix.



+   The command is currently scanning the
current_relid
+   to obtain samples.

I suggest:

The command is currently scanning the the table given by
current_relid to obtain sample rows.



Will fix.



+   The command is currently scanning the
current_child_table_relid
+   to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows.  Columns
child_tables_total,
child_tables_done, and
current_child_table_relid contain the progress
information for this phase.



Will fix.



+    
+ computing stats

I think the phase name should really be "computing statistics", that
is, use the full word.



Will fix.



+ 
+   The command is computing stats from the samples obtained
during the table scan.
+ 
+    

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan



Will fix.



+ computing extended stats
+ 
+   The command is computing extended stats from the samples
obtained in the previous phase.
+ 

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.



Will fix.



I fixed the document based on Amit's comments. :)
Please find attached file.


Thanks,
Tatsuro Yamadas







diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..a5da126b8f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3502,7 +3510,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the 
progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3948,6 +3956,179 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+

Re: progress report for ANALYZE

2019-12-17 Thread Tatsuro Yamada


Hi Amit-san,



  >>> 1)

For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See  bellow.


In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.


//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.


On second thought, maybe we should not reset, because that might be
considered loss of information.  To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.



Okay, agreed. :)

 

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?


It would be better to modify the document of "finalizing analyze" phase.

# Before modify
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end.

# Modified
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end. In the case of partitioned table,
 it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)


*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition.  So, I don't think we need to clarify that if only in
one phase's description.  Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables.  Like this:

   

 Note that when ANALYZE is run on a partitioned table,
 all of its partitions are also recursively analyzed as also mentioned on
 .  In that case, ANALYZE
 progress is reported first for the parent table, whereby its inheritance
 statistics are collected, followed by that for each partition.

   



Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.



Some more comments on the documentation:

+   Number of computed extended stats.  This counter only advances
when the phase
+   is computing extended stats.

Number of computed extended stats -> Number of extended stats computed



Will fix.

 

+   Number of analyzed child tables.  This counter only advances
when the phase
+   is computing extended stats.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
meant "when the phase is acquiring inherited sample
rows.  I suggest to write this as follows:

Number of child tables scanned.  This counter only advances when the phase
is acquiring inherited sample rows.



Oops, I will fix it. :)


 

+ OID of the child table currently being scanned.
+   It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned.  This field is only valid when
the phase is computing extended stats.



Will fix.



+   The command is currently scanning the
current_relid
+   to obtain samples.

I suggest:

The command is currently scanning the the table given by
current_relid to obtain sample rows.



Will fix.

 

+   The command is currently scanning the
current_child_table_relid
+   to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows.  Columns
child_tables_total,
child_tables_done, and
current_child_table_relid contain the progress
information for this phase.



Will fix.



+
+ computing stats

I think the phase name should really be "computing statistics", that
is, use the full word.



Will fix.

 

+ 
+   The command is computing stats from the samples obtained
during the table scan.
+ 
+

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan



Will fix.

 

+ computing extended stats
+ 
+   The command is computing extended stats from the samples
obtained in the previous phase.
+ 

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.



Will fix.


Thanks for your above useful suggestions. It helps me a lot. :)


Cheers!
Tatsuro Yamada

Re: progress report for ANALYZE

2019-12-09 Thread Amit Langote

Yamada-san,

On Fri, Dec 6, 2019 at 3:24 PM Tatsuro Yamada
 wrote:
 >>> 1)
> >>> For now, I'm not sure it should be set current_child_table_relid to zero
> >>> when the current phase is changed from "acquiring inherited sample rows" 
> >>> to
> >>> "computing stats". See  bellow.
> >>
> >> In the upthread discussion [1], Robert asked to *not* do such things,
> >> that is, resetting some values due to phase change.  I'm not sure his
> >> point applies to this case too though.
>
> //Below "computing stats" is for the partitioned table hoge,
> //therefore the second column from the left side would be
> //better to set Zero to easy to understand.

On second thought, maybe we should not reset, because that might be
considered loss of information.  To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.

> >>> 2)
> >>> There are many "finalizing analyze" phases based on relids in the case
> >>> of partitioning tables. Would it better to fix the document? or it
> >>> would be better to reduce it to one?
> >>>
> It would be better to modify the document of "finalizing analyze" phase.
>
># Before modify
> The command is updating pg_class. When this phase is completed,
> ANALYZE will end.
>
># Modified
> The command is updating pg_class. When this phase is completed,
> ANALYZE will end. In the case of partitioned table,
> it might be shown on each partitions.
>
> What do you think that? I'm going to fix it, if you agreed. :)

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition.  So, I don't think we need to clarify that if only in
one phase's description.  Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables.  Like this:

Note that when ANALYZE is run on a partitioned table,
all of its partitions are also recursively analyzed as also mentioned on
.  In that case, ANALYZE
progress is reported first for the parent table, whereby its inheritance
statistics are collected, followed by that for each partition.

Some more comments on the documentation:

+   Number of computed extended stats.  This counter only advances
when the phase
+   is computing extended stats.

Number of computer extended stats -> Number of extended stats computed

+   Number of analyzed child tables.  This counter only advances
when the phase
+   is computing extended stats.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
meant "when the phase is acquiring inherited sample
rows.  I suggest to write this as follows:

Number of child tables scanned.  This counter only advances when the phase
is acquiring inherited sample rows.

+ OID of the child table currently being scanned.
+   It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned.  This field is only valid when
the phase is computing extended stats.

+   The command is currently scanning the
current_relid
+   to obtain samples.

I suggest:

The command is currently scanning the the table given by
current_relid to obtain sample rows.

+   The command is currently scanning the
current_child_table_relid
+   to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows.  Columns
child_tables_total,
child_tables_done, and
current_child_table_relid contain the progress
information for this phase.

+
+ computing stats

I think the phase name should really be "computing statistics", that
is, use the full word.

+ 
+   The command is computing stats from the samples obtained
during the table scan.
+ 
+

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan

+ computing extended stats
+ 
+   The command is computing extended stats from the samples
obtained in the previous phase.
+ 

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.

Thanks,
Amit

Re: progress report for ANALYZE

2019-12-05 Thread Tatsuro Yamada


Hi Amit-san,



I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See  bellow.


In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.


Yeah, I understood.
I'll check target relid of "computing stats" to re-read a code of
analyze command later. :)



Finally, I understood after investigation of the code. :)
Call stack is the following, and analyze_rel() calls "N + 1" times
for partitioned table and each partitions.

analyze_rel start
 do_analyze_rel inh==true start
  onerel: hoge2
   acq_inh_sample_rows start
childrel: hoge2_1
childrel: hoge2_2
childrel: hoge2_3
childrel: hoge2_default
   acq_inh_sample_rows end
   compute_stats start
   compute_stats end
   compute_index_stats start
   compute_index_stats end
   finalizing start
   finalizing end
 do_analyze_rel inh==true end
analyze_rel end
...


Also, I checked my test result. ("//" is my comments)


# select oid,relname,relkind from pg_class where relname like 'hoge2%';
  oid  |relname| relkind
---+---+-
 36081 | hoge2 | p
 36084 | hoge2_1   | r
 36087 | hoge2_2   | r
 36090 | hoge2_3   | r
 36093 | hoge2_default | r
(6 rows)

# select relid,
 current_child_table_relid,
 phase,
 sample_blks_total,
 sample_blks_scanned,
 ext_stats_total,
 ext_stats_computed,
 child_tables_total,
 child_tables_done
  from pg_stat_progress_analyze; \watch 0.1

== for partitioned table hoge2 ==
//hoge2_1
36081|36084|acquiring inherited sample rows|45|20|0|0|4|0
36081|36084|acquiring inherited sample rows|45|42|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0

//hoge2_2
36081|36087|acquiring inherited sample rows|45|3|0|0|4|1
36081|36087|acquiring inherited sample rows|45|31|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1

//hoge2_3
36081|36090|acquiring inherited sample rows|45|12|0|0|4|2
36081|36090|acquiring inherited sample rows|45|35|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2

//hoge2_default
36081|36093|acquiring inherited sample rows|45|18|0|0|4|3
36081|36093|acquiring inherited sample rows|45|38|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.
//I guessd that user think which relid is the target of
//"computing stats"?!
//Of course, other option is to write it on document.

36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|finalizing analyze  |45|45|0|0|4|4

== for each partitions such as hoge2_1 ... hoge2_default ==

//hoge2_1
36084|0|acquiring sample rows   |45|25|0|0|0|0
36084|0|computing stats |45|45|0|0|0|0
36084|0|computing extended stats|45|45|0|0|0|0
36084|0|finalizing analyze  |45|45|0|0|0|0

//hoge2_2
36087|0|acquiring sample rows   |45|14|0|0|0|0
36087|0|computing stats |45|45|0|0|0|0
36087|0|computing extended stats|45|45|0|0|0|0
36087|0|finalizing analyze  |45|45|0|0|0|0

//hoge2_3
36090|0|acquiring sample rows   |45|12|0|0|0|0
36090|0|acquiring sample rows   |45|44|0|0|0|0
36090|0|computing extended stats|45|45|0|0|0|0
36090|0|finalizing analyze  |45|45|0|0|0|0

//hoge2_default
36093|0|acquiring sample rows   |45|10|0|0|0|0
36093|0|acquiring sample rows   |45|43|0|0|0|0
36093|0|computing extended stats|45|45|0|0|0|0
36093|0|finalizing analyze  |45|45|0|0|0|0




2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?


-
   finalizing analyze
   
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end.
-


When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics.  Note that this recursive

Re: progress report for ANALYZE

2019-12-04 Thread Tatsuro Yamada


Hi Amit-san,

Thanks for your comments!


Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See  bellow.


In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.


Yeah, I understood.
I'll check target relid of "computing stats" to re-read a code of
analyze command later. :)

 

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?


-
   finalizing analyze
   
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end.
-


When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics.  Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.


Thanks for your explanation.
I understand Analyzing Partitioned table a little.
Below is my understanding. Is it right?

==
In the case of partitioned table (N = 3)

 - Partitioned table name: p (relid is 100)
 - Partitioning table names: p1, p2, p3 (relids are 201, 202 and 203)

For now, We can get the following results by executing "analyze p;".

Num Phase relid current_child_table_relid
 1  acquire inherited sample rows  100   201
 2  acquire inherited sample rows  100   202
 3  acquire inherited sample rows  100   203
 4  computing stats100   0
 5  finalizing analyze 100   0

 6  acquiring sample rows  201   0
 7  computing stats201   0
 8  finalizing analyze 201   0

 9  acquiring sample rows  202   0
10  computing stats202   0
11  finalizing analyze 202   0

12  acquiring sample rows  203   0
13  computing stats203   0
14  finalizing analyze 203   0
==


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-29 Thread Amit Langote

Yamada-san,

On Fri, Nov 29, 2019 at 5:45 PM Tatsuro Yamada
 wrote:
> Attached patch is the revised patch. :)
>
> I wonder two things below. What do you think?
>
> 1)
> For now, I'm not sure it should be set current_child_table_relid to zero
> when the current phase is changed from "acquiring inherited sample rows" to
> "computing stats". See  bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.

> 2)
> There are many "finalizing analyze" phases based on relids in the case
> of partitioning tables. Would it better to fix the document? or it
> would be better to reduce it to one?
>
> 
> -
>   finalizing analyze
>   
> The command is updating pg_class. When this phase is completed,
> ANALYZE will end.
> -

When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics.  Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.

Thanks,
Amit

Re: progress report for ANALYZE

2019-11-29 Thread Tatsuro Yamada


Hi Alvaro and Amit!

On 2019/11/29 9:54, Tatsuro Yamada wrote:

Hi Alvaro!


Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:


Here's my take on it:

  
   pid
   datid
   datname
   relid
   phase
   sample_blks_total
   sample_blks_scanned
   ext_stats_total
   ext_stats_computed
   child_tables_total
   child_tables_done
   current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name.  For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.



Thanks for the comment.
Oops, You are right, I overlooked they are not relids..
I agreed with you and Amit's opinion so I'll send a revised patch on the next 
mail. :-)

Next patch will be included:
  - New columns definition of the view (as above)
  - Renamed the phase name: s/acquiring inh sample rows/acquiring inherited 
sample rows/
  - Update document


Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See  bellow.

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?


-
 finalizing analyze
 
   The command is updating pg_class. When this phase is completed,
   ANALYZE will end.
-



-
# \d pg_stat_progress_analyze
  View "pg_catalog.pg_stat_progress_analyze"
  Column   |  Type   | Collation | Nullable | Default
---+-+---+--+-
 pid   | integer |   |  |
 datid | oid |   |  |
 datname   | name|   |  |
 relid | oid |   |  |
 phase | text|   |  |
 sample_blks_total | bigint  |   |  |
 sample_blks_scanned   | bigint  |   |  |
 ext_stats_total   | bigint  |   |  |
 ext_stats_computed| bigint  |   |  |
 child_tables_total| bigint  |   |  |
 child_tables_done | bigint  |   |  |
 current_child_table_relid | oid |   |  |
-




-
# select * from pg_stat_progress_analyze ; \watch 0.0001

19309|13583|postgres|36081|acquiring inherited sample rows|0|0|0|0|0|0|0
19309|13583|postgres|36081|acquiring inherited sample rows|45|17|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|35|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|3|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|22|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|38|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|16|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|34|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|10|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|29|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|43|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093

Re: progress report for ANALYZE

2019-11-28 Thread Tatsuro Yamada


Hi Michael,

On 2019/11/27 13:25, Michael Paquier wrote:

On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:

Fixed.


Patch was waiting on input from author, so I have switched it back to
"Needs review", and moved it to next CF while on it as you are working
on it.


Thanks for your CF manager work.
I will do my best to be committed at the next CF because
Progress reporting feature is useful for DBAs, as you know. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-28 Thread Tatsuro Yamada


Hi Alvaro!


Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:


Here's my take on it:

  
   pid
   datid
   datname
   relid
   phase
   sample_blks_total
   sample_blks_scanned
   ext_stats_total
   ext_stats_computed
   child_tables_total
   child_tables_done
   current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name.  For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.



Thanks for the comment.
Oops, You are right, I overlooked they are not relids..
I agreed with you and Amit's opinion so I'll send a revised patch on the next 
mail. :-)

Next patch will be included:
 - New columns definition of the view (as above)
 - Renamed the phase name: s/acquiring inh sample rows/acquiring inherited 
sample rows/
 - Update document


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-28 Thread Alvaro Herrera

On 2019-Nov-28, Tatsuro Yamada wrote:

> Hmmm... I understand your opinion but I'd like to get more opinions too. :)
> Do you prefer these column names? See below:

Here's my take on it:

  pid
  datid
  datname
  relid
  phase
  sample_blks_total
  sample_blks_scanned
  ext_stats_total
  ext_stats_computed
  child_tables_total
  child_tables_done
  current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name.  For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-11-28 Thread Tatsuro Yamada


Hi Amit-san,

On 2019/11/28 10:59, Amit Langote wrote:

Yamada-san,

Thank you for updating the patch.

On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
 wrote:

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:


/me reviews.

+  scanning_table

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.


So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view.


Robert's comment seems OK for a column that actually reports an OID,
but for columns that report counts, names containing "relids" sound a
bit strange to me.  So, I prefer child_tables_total /
child_tables_done over child_relids_total / child_relids_done.  Would
like to hear more opinions.


Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:


 pid
 datid
 datname
 relid
 phase
 sample_blks_total
 heap_blks_scanned
 ext_stats_total
 ext_stats_computed
 child_tables_total  <= Renamed: s/relid/table/
 child_tables_done   <= Renamed: s/relid/table/
 current_child_table <= Renamed: s/relid/table/




Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows".  In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.


Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch.



Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.


I think phase names should contain full English words, because they
are supposed to be descriptive.  Users are very likely to not
understand what "inh" means without looking up the docs.



Okay, I will fix it.
   s/acquiring inh sample rows/acquiring inherited sample rows/


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-27 Thread Amit Langote

Yamada-san,

Thank you for updating the patch.

On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
 wrote:
> But I just remembered I replaced column name "*_table" with "*_relid"
> based on Robert's comment three months ago, see below:
>
> > /me reviews.
> >
> > +  scanning_table
> >
> > I think this should be retitled to something that ends in 'relid',
> > like all of the corresponding cases in existing progress views.
> > Perhaps 'active_relid' or 'current_relid'.
>
> So, it would be better to use same rule against child_tables_total and
> child_tables_done. Thus I changed these column names to others and added
> to the view.

Robert's comment seems OK for a column that actually reports an OID,
but for columns that report counts, names containing "relids" sound a
bit strange to me.  So, I prefer child_tables_total /
child_tables_done over child_relids_total / child_relids_done.  Would
like to hear more opinions.

> >> Also, inheritance tree stats are created *after* creating single table
> >> stats, so I think that it would be better to have a distinct phase
> >> name for that, say "acquiring inherited sample rows".  In
> >> do_analyze_rel(), you can select which of two phases to set based on
> >> whether inh is true or not.
> >
> > Okay! I'll also add the new phase "acquiring inherited sample rows" on
> > the next patch.
>
>
> Fixed.
>
> I tried to abbreviate it to "acquiring inh sample rows" because I thought
> "acquiring inherited sample rows" is a little long for the phase name.

I think phase names should contain full English words, because they
are supposed to be descriptive.  Users are very likely to not
understand what "inh" means without looking up the docs.

Thanks,
Amit

Re: progress report for ANALYZE

2019-11-26 Thread Michael Paquier

On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:
> Fixed.

Patch was waiting on input from author, so I have switched it back to
"Needs review", and moved it to next CF while on it as you are working
on it.
--
Michael


signature.asc
Description: PGP signature

Re: progress report for ANALYZE

2019-11-26 Thread Tatsuro Yamada


Hi Amit-san!


I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be.  I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions.  So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.


Ah, I understood.
I'll check pg_stat_progress_create_index does for partitions,
and will create a new patch. 


Fixed.

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:


/me reviews.

+  scanning_table

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.


So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view. I also removed include_children and current_relid.
The following columns are new version.


 pid
 datid
 datname
 relid
 phase
 sample_blks_total
 heap_blks_scanned
 ext_stats_total <= Added (based on Alvaro's comment)
 ext_stats_computed  <= Renamed
 child_relids_total  <= Added
 child_relids_done   <= Added
 current_child_relid <= Added



Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows".  In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.  For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.



Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch. 



Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.

Attached WIP patch is including these fixes:
  - Remove columns: include_children and current_relid
  - Add new columns: child_relieds_total, child_relids_done and 
current_child_relid
  - Add new phase "acquiring inh sample rows"

  Note: the document is not updated, I'll fix it later. :)

Attached testcase.sql is for creating base table and partitioning table.
You can check the patch by using the following procedures, easily.

Terminal #1
--
\a \t
select * from pg_stat_progress_analyze; \watch 0.0001
--

Terminal #2
--
\i testcase.sql
analyze hoge;
analyze hoge2;
--

Thanks,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..462d3a5415 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3502,7 +3510,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the 
progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3948,6 +3956,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether scanning through child tables.
+ 
+ 
+  current_relid
+  oid
+  OID of the table currently being scanned.
+It might be different

Re: progress report for ANALYZE

2019-11-26 Thread Tatsuro Yamada


Hi Amit-san,


On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
 wrote:

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)


Maybe you meant "index_rebuild_total"?


Yeah, you are right! :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-26 Thread Alvaro Herrera

On 2019-Nov-27, Amit Langote wrote:

> On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera  
> wrote:
> >
> > On 2019-Nov-26, Tatsuro Yamada wrote:
> >
> > > > I wonder whether we need the total number of ext stats on
> > > > pg_stat_progress_analyze or not. As you might know, there is the same
> > > > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > > > For example, index_vacuum_count and index_rebuild_count.
> > >
> > > Would it be better to add the total number column to these views? :)
> >
> > Yeah, I think it would be good to add that.
> 
> Hmm, does it take that long to calculate ext stats on one column?  The
> reason it's worthwhile to have index_vacuum_count,
> index_rebuild_count, etc. is because it can take very long for one
> index to get vacuumed/rebuilt.

Yes, it's noticeable.  It's not as long as building an index, of course,
but it's a long enough fraction of the total analyze time that it should
be reported.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-11-26 Thread Amit Langote

Yamada-san,

On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
 wrote:
> Regarding to other total number columns,
> I'll create another patch to add these columns "index_vacuum_total" and
> "index_rebuild_count" on the other views. :)

Maybe you meant "index_rebuild_total"?

Thanks,
Amit

Re: progress report for ANALYZE

2019-11-26 Thread Amit Langote

On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera  wrote:
>
> On 2019-Nov-26, Tatsuro Yamada wrote:
>
> > > I wonder whether we need the total number of ext stats on
> > > pg_stat_progress_analyze or not. As you might know, there is the same
> > > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > > For example, index_vacuum_count and index_rebuild_count.
> >
> > Would it be better to add the total number column to these views? :)
>
> Yeah, I think it would be good to add that.

Hmm, does it take that long to calculate ext stats on one column?  The
reason it's worthwhile to have index_vacuum_count,
index_rebuild_count, etc. is because it can take very long for one
index to get vacuumed/rebuilt.

Thanks,
Amit

Re: progress report for ANALYZE

2019-11-26 Thread Tatsuro Yamada


Hi Alvaro!

On 2019/11/26 21:22, Alvaro Herrera wrote:

On 2019-Nov-26, Tatsuro Yamada wrote:


I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.


Would it be better to add the total number column to these views? :)


Yeah, I think it would be good to add that.



Thanks for your comment!
Okay, I'll add the column "ext_stats_total" to
pg_stat_progress_analyze view on the next patch. :)

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-26 Thread Alvaro Herrera

On 2019-Nov-26, Tatsuro Yamada wrote:

> > I wonder whether we need the total number of ext stats on
> > pg_stat_progress_analyze or not. As you might know, there is the same
> > counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
> > For example, index_vacuum_count and index_rebuild_count.
> 
> Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-11-25 Thread Tatsuro Yamada


Hi Amit-san,



Related to the above,
I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.
Would it be added to the total number column to these views? :)



Oops, I made a mistake. :(

What I'd like to say was:
Would it be better to add the total number column to these views? :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-25 Thread Tatsuro Yamada


Hi Amit-san!

Thanks for your comments!



I have looked at the patch and here are some comments.

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be.  I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions.  So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.


Ah, I understood.
I'll check pg_stat_progress_create_index does for partitions,
and will create a new patch. :)

Related to the above,
I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.
Would it be added to the total number column to these views? :)



Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows".  In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.  For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.



Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-11-18 Thread Amit Langote

Yamada-san,

Thanks for working on this.

On Wed, Nov 6, 2019 at 2:50 PM Tatsuro Yamada
 wrote:
> I revised the patch as following because I realized counting the types of ext
> stats is not useful for users.
>
>   - Attached new patch counts a number of ext stats instead the types of ext 
> stats.
>
> So we can see the counter goes to "2", if we created above ext stats (pg_ext1 
> and
> pg_ext2) and analyzed as you wrote. :)

I have looked at the patch and here are some comments.

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be.  I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions.  So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows".  In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.  For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.

Thanks,
Amit

Re: progress report for ANALYZE

2019-11-05 Thread Tatsuro Yamada


Hi Alvaro!

On 2019/11/05 22:38, Alvaro Herrera wrote:

On 2019-Nov-05, Tatsuro Yamada wrote:


==
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;


Wow, it takes a long time to compute these ...

Hmm, you normally wouldn't define stats that way; you'd do this instead:

create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from 
pgbench_accounts;


I'd like to say it's a just example of test case. But I understand that
your advice. Thanks! :)

 

I'm not sure if this has an important impact in practice.  What I'm
saying is that I'm not sure that "number of ext stats" is necessarily a
useful number as shown.  I wonder if it's possible to count the number
of items that have been computed for each stats object.  So if you do
this

create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from 
pgbench_accounts;

then the counter goes to 4.  But I also wonder if we need to publish
_which_ type of ext stats is currently being built, in a separate
column.



Hmm... I have never seen a lot of extended stats on a table (with many columns)
but I suppose it will be existence near future because extended stats is an only
solution to correct row estimation error in vanilla PostgreSQL. Therefore, it
would be better to add the counter on the view, I think.

I revised the patch as following because I realized counting the types of ext
stats is not useful for users.

 - Attached new patch counts a number of ext stats instead the types of ext 
stats.

So we can see the counter goes to "2", if we created above ext stats (pg_ext1 
and
pg_ext2) and analyzed as you wrote. :)


Thanks,
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e6c49eebad..c84d9a0775 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the 
progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3927,6 +3935,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether scanning through child tables.
+ 
+ 
+  current_relid
+  oid
+  OID of the table currently being scanned.
+It might be different from relid when analyzing tables that have child 
tables.
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ sample_blks_total
+ bigint
+ 
+   Total number of heap blocks that will be sampled.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+
+ ext_compute_count
+ bigint
+ 
+   Number of computed extended stats.  This counter only advances when the 
phase
+   is computing extended stats.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ acquiring sample rows
+ 
+   The command is currently scanning the 
current_relid
+   to obtain samples.
+ 
+
+
+ computing stats
+ 
+   The

Re: progress report for ANALYZE

2019-11-05 Thread Alvaro Herrera

On 2019-Nov-05, Tatsuro Yamada wrote:

> ==
> [Session1]
> \! pgbench -i
> create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
> create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
> create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

Wow, it takes a long time to compute these ...

Hmm, you normally wouldn't define stats that way; you'd do this instead:

create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from 
pgbench_accounts;

I'm not sure if this has an important impact in practice.  What I'm
saying is that I'm not sure that "number of ext stats" is necessarily a
useful number as shown.  I wonder if it's possible to count the number
of items that have been computed for each stats object.  So if you do
this

create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from 
pgbench_accounts;

then the counter goes to 4.  But I also wonder if we need to publish
_which_ type of ext stats is currently being built, in a separate
column.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-11-05 Thread Tatsuro Yamada


Hi Alvaro, vignesh,

I rebased the patch on 2a4d96eb, and added new column
"ext_compute_count" in pg_stat_progress_analyze vie to
report a number of computing extended stats.
It is like a "index_vacuum_count" in vacuum progress reporter or
"index_rebuild_count" in cluster progress reporter. :)

Please find attached file: v7.

And the following is a test result:
==
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

[Session2]
# \a \t
# select * from pg_stat_progress_analyze ; \watch 0.0001

27064|13583|postgres|16405|initializing|f|0|0|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|23|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|64|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|1
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|2
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|3
27064|13583|postgres|16405|finalizing analyze|f|16405|1640|1640|3

Note:
 The result on Session2 was shortened for readability.
 If you'd like to check the whole result, you can see attached file: "hoge.txt".
==

Thanks,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e6c49eebad..c84d9a0775 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the 
progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3927,6 +3935,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether scanning through child tables.
+ 
+ 
+  current_relid
+  oid
+  OID of the table currently being scanned.
+It might be different from relid when analyzing tables that have child 
tables.
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ sample_blks_total
+ bigint
+ 
+   Total number of heap blocks that will be sampled.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+
+ ext_compute_count
+ bigint
+ 
+   Number of computed extended stats.  This counter only advances when the 
phase
+   is computing extended stats.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ acquiring sample rows
+ 
+   The command is currently scanning the 
current_relid
+   to obtain samples.
+ 
+
+
+ computing stats
+ 
+   The command is computing stats from the samples obtained during the 
table scan.
+ 
+
+
+ computing extended stats
+ 
+   The command is computing extended stats from the samples obtained in 
the previous phase.

Re: progress report for ANALYZE

2019-11-01 Thread Tatsuro Yamada


Hi vignesh!


On 2019/09/17 20:51, vignesh C wrote:

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera  wrote:


There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.


+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
  Time when this process' current transaction was started, or null
   if no transaction is active. If the current
   query is the first of its transaction, this column is equal to the
   query_start column.
  



Sorry for the late reply.

Probably, it is intentional because there are many extra space
in other documents. See below:

# Results of grep
=
$ grep '\.  ' doc/src/sgml/monitoring.sgml | wc -l
114
$ grep '\.  ' doc/src/sgml/information_schema.sgml | wc -l
184
$ grep '\.  ' doc/src/sgml/func.sgml | wc -l
577
=

Therefore, I'm going to leave it as is. :)


Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-09-17 Thread vignesh C

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera  wrote:
>
> There were some minor problems in v5 -- bogus Docbook as well as
> outdated rules.out, small "git diff --check" complaint about whitespace.
> This v6 (on today's master) fixes those, no other changes.
>
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
 Time when this process' current transaction was started, or null
  if no transaction is active. If the current
  query is the first of its transaction, this column is equal to the
  query_start column.
 

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: progress report for ANALYZE

2019-09-05 Thread Tatsuro Yamada


Hi Alvaro,


There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.


 
Thanks for fixing that. :)

I'll test it later.


I think we have to address the following comment from Robert. Right?
Do you have any ideas?



I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.



Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas?


# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
, );
else
numrows = (*acquirefunc) (onerel, elevel,
  rows, targrows,
  , );


# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
   /* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
, )



Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-09-04 Thread Alvaro Herrera

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..630f9821a9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
PostgreSQL has the ability to report the progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are CREATE INDEX,
-   VACUUM and
+   VACUUM, ANALYZE and
CLUSTER. This may be expanded in the future.
   
 
@@ -3927,6 +3935,132 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether scanning through child tables.
+ 
+ 
+  current_relid
+  oid
+  OID of the table currently being scanned.
+It might be different from relid when analyzing tables that have child tables.
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ sample_blks_total
+ bigint
+ 
+   Total number of heap blocks that will be sampled.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ acquiring sample rows
+ 
+   The command is currently scanning the current_relid
+   to obtain samples.
+ 
+
+
+ computing stats
+ 
+   The command is computing stats from the samples obtained during the table scan.
+ 
+
+
+ computing extended stats
+ 
+   The command is computing extended stats from the samples obtained in the previous phase.
+ 
+
+
+ finalizing analyze
+ 
+   The command is updating pg_class. When this phase is completed, 
+   ANALYZE will end.
+ 
+
+   
+   
+  
+ 
+
  
   CLUSTER Progress Reporting
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..cc0c6291d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+SELECT
+S.pid AS pid, S.datid AS datid, D.datname AS datname,
+CAST(S.relid AS oid) AS relid,
+CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+CAST(S.param3 AS oid) AS current_relid,
+CASE S.param1 WHEN 0 THEN 'initializing'
+  WHEN 1 THEN 'acquiring sample rows'
+  WHEN 2 THEN 'computing stats'
+  WHEN 3 THEN 'computing extended stats'
+  WHEN 4 THEN 'finalizing analyze'
+  END AS phase,
+S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned
+FROM pg_stat_get_progress_info('ANALYZE') AS S
+LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
 SELECT
 S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2fdf447b6..6380e4cbbb

Re: progress report for ANALYZE

2019-08-14 Thread Tatsuro Yamada

Hi Alvaro,

On 2019/08/13 23:01, Alvaro Herrera wrote:

Hello,

On 2019-Jul-03, Tatsuro Yamada wrote:

My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

[1] ANALYZE command progress checker
https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

So just now I went to check the jold thread (which I should have
searched for before writing my own implementation!). It seems clear
that many things are pretty similar in both patch, but I think for the
most part they are similar just because the underlying infrastructure
imposes a certain design already, rather than there being any actual
copying. (To be perfectly clear: I didn't even know that this patch
existed and I didn't grab any code from there to produce my v1.)

I know your patch was not based on Vinayak's old patch because
coding style is different between him and you.

However, you've now modified the patch from what I submitted and I'm
wondering if you've taken any inspiration from Vinayak's old patch. If
so, it seems fair to credit him as co-author in the commit message. It
would be good to get his input on the current patch, though.

Yeah, I'm happy if you added his name as co-authors because I checked
the document including his old patch as a reference. :)

I have not looked at the current version of the patch yet, but I intend
to do so during the upcoming commitfest.

Thanks for moving this forward!

Thanks!
Committing the patch on PG13 makes me happy because Progress reporting
features are important for DBA. :)

On the subject of FDW support: I did look into supporting that before
submitting this. I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while. Sadly, in practical terms it looks like postgres_fdw

Actually, I've changed my mind.
Even if there is no FDW support, Progress report for ANALYZE is still useful.
Therefore, FDW support would be preferable but not required for committing
the patch, I believe. :)

Thanks,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-08-14 Thread Alvaro Herrera

On 2019-Aug-14, Etsuro Fujita wrote:

> On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
>  wrote:
> > On the subject of FDW support: I did look into supporting that before
> > submitting this.  I think it's not academically difficult: just have the
> > FDW's acquire_sample_rows callback invoke the update_param functions
> > once in a while.  Sadly, in practical terms it looks like postgres_fdw
> > is quite stupid about ANALYZE (it scans the whole table??) so doing
> > something that's actually useful may not be so easy.  At least, we know
> > the total relation size and maybe we can add the ctid column to the
> > cursor in postgresAcquireSampleRowsFunc so that we have a current block
> > number to report (becing careful about synchronized seqscans).
> 
> I don't follow this thread fully, so I might miss something, but I
> don't think that's fully applicable, because foreign tables managed by
> postgres_fdw can be eg, views on the remote side.

Oh, hmm, well I guess that covers the tables and matviews then ... I'm
not sure there's a good way to cover foreign tables that are views or
other stuff.  Maybe that'll have to do.  But at least it covers more
cases than none.

> > I do wonder why doesn't postgres_fdw use TABLESAMPLE.
> 
> Yeah, that's really what I'm thinking for PG13; but I think we would
> still need to scan the whole table in some cases (eg, when the foreign
> table is a view on the remote side), because the TABLESAMLE clause can
> only be applied to regular tables and materialized views.

Sure.

> > I did not look at other FDWs at all, mind.
> 
> IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that.  Right?

Yeah, it does that, I checked precisely oracle_fdw this morning.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-08-14 Thread Etsuro Fujita

Hi,

On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
 wrote:
> On the subject of FDW support: I did look into supporting that before
> submitting this.  I think it's not academically difficult: just have the
> FDW's acquire_sample_rows callback invoke the update_param functions
> once in a while.  Sadly, in practical terms it looks like postgres_fdw
> is quite stupid about ANALYZE (it scans the whole table??) so doing
> something that's actually useful may not be so easy.  At least, we know
> the total relation size and maybe we can add the ctid column to the
> cursor in postgresAcquireSampleRowsFunc so that we have a current block
> number to report (becing careful about synchronized seqscans).

I don't follow this thread fully, so I might miss something, but I
don't think that's fully applicable, because foreign tables managed by
postgres_fdw can be eg, views on the remote side.

> I do wonder why doesn't postgres_fdw use TABLESAMPLE.

Yeah, that's really what I'm thinking for PG13; but I think we would
still need to scan the whole table in some cases (eg, when the foreign
table is a view on the remote side), because the TABLESAMLE clause can
only be applied to regular tables and materialized views.

> I did not look at other FDWs at all, mind.

IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that.  Right?

Best regards,
Etsuro Fujita

Re: progress report for ANALYZE

2019-08-13 Thread Alvaro Herrera

Hello,

On 2019-Jul-03, Tatsuro Yamada wrote:

> My ex-colleague Vinayak created same patch in 2017 [1], and he
> couldn't get commit because there are some reasons such as the
> patch couldn't handle analyzing Foreign table. Therefore, I wonder
> whether your patch is able to do that or not.

> [1] ANALYZE command progress checker
> https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

So just now I went to check the jold thread (which I should have
searched for before writing my own implementation!).  It seems clear
that many things are pretty similar in both patch, but I think for the
most part they are similar just because the underlying infrastructure
imposes a certain design already, rather than there being any actual
copying.  (To be perfectly clear: I didn't even know that this patch
existed and I didn't grab any code from there to produce my v1.)

However, you've now modified the patch from what I submitted and I'm
wondering if you've taken any inspiration from Vinayak's old patch.  If
so, it seems fair to credit him as co-author in the commit message.  It
would be good to get his input on the current patch, though.

I have not looked at the current version of the patch yet, but I intend
to do so during the upcoming commitfest.

Thanks for moving this forward!

On the subject of FDW support: I did look into supporting that before
submitting this.  I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while.  Sadly, in practical terms it looks like postgres_fdw
is quite stupid about ANALYZE (it scans the whole table??) so doing
something that's actually useful may not be so easy.  At least, we know
the total relation size and maybe we can add the ctid column to the
cursor in postgresAcquireSampleRowsFunc so that we have a current block
number to report (becing careful about synchronized seqscans).  I think
this should *not* be part of the main ANALYZE-progress commit, though,
because getting that properly sorted out is going to take some more
time.

I do wonder why doesn't postgres_fdw use TABLESAMPLE.

I did not look at other FDWs at all, mind.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-08-12 Thread Tatsuro Yamada


Hi Robert and All!


On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro 
 wrote:

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
 wrote:

Attached v4 patch file only includes this fix.


I've moved this to the September CF, where it is in "Needs review" state.


/me reviews.



Thanks for your comments! :)



+  scanning_table

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.



Fixed.
I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml.
However, I didn't change "relid" on other places for other commands because
I'd like to create other patch for that later. :)


 > +   The command is computing extended stats from the samples

obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."



Fixed.



+   Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table.  I think that's the right thing to advertise, but
then the column should be named and documented that way.



Ah, you are right. Fixed.
I used the following sentence based on Vinayak's patch created two years a go.

- heap_blks_total
- bigint
- 
-   Total number of heap blocks in the current_relid.
- 

+ sample_blks_total
+ bigint
+ 
+   Total number of heap blocks that will be sampled.
+




+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }

This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel).  Maybe you can also come up with better
names than 'index' and 'val'.  Same comment applies to another place
where you have something similar.



I agreed and fixed.



Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?



I fixed as following:

s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/
  PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/

s/WHEN 1 THEN 'scanning table'::text/
  WHEN 1 THEN 'acquiring sample rows'::text/



I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.



Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas? :)


# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
, );
else
numrows = (*acquirefunc) (onerel, elevel,
  rows, targrows,
  , );


# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
   /* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
, )



Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

Re: progress report for ANALYZE

2019-08-01 Thread Robert Haas

On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro  wrote:
> On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
>  wrote:
> > Attached v4 patch file only includes this fix.
>
> I've moved this to the September CF, where it is in "Needs review" state.

/me reviews.

+  scanning_table

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

+   The command is computing extended stats from the samples
obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."

+   Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table.  I think that's the right thing to advertise, but
then the column should be named and documented that way.

+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }

This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel).  Maybe you can also come up with better
names than 'index' and 'val'.  Same comment applies to another place
where you have something similar.

Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count.  That creates a race during which users could see the first bit
of data set and the second not set.  I don't see any reason not to set
all four fields together.

Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE   1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE 4

vs.

+WHEN 0 THEN 'initializing'::text
+WHEN 1 THEN 'scanning table'::text
+WHEN 2 THEN 'computing stats'::text
+WHEN 3 THEN 'computing extended stats'::text
+WHEN 4 THEN 'finalizing analyze'::text

Not terrible, but it could be closer.

Similarly with the column names (include_children vs. INH).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

2019-08-01 Thread Thomas Munro

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
 wrote:
> Attached v4 patch file only includes this fix.

Hello all,

I've moved this to the September CF, where it is in "Needs review" state.

Thanks,

-- 
Thomas Munro
https://enterprisedb.com

Re: progress report for ANALYZE

2019-07-22 Thread Tatsuro Yamada


Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,


On 2019/07/22 17:30, Kyotaro Horiguchi wrote:

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada  
wrote in <0876b4fe-26fb-ca32-f179-c696fa3dd...@nttcom.co.jp>

3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
-


I have some comments.
+   const int   index[] = {
+   PROGRESS_ANALYZE_PHASE,
+   PROGRESS_ANALYZE_TOTAL_BLOCKS,
+   PROGRESS_ANALYZE_BLOCKS_DONE
+   };
+   const int64 val[] = {
+   PROGRESS_ANALYZE_PHASE_ANALYSIS,
+   0, 0
Do you oppose to leaving the total/done blocks alone here:p?



Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".


Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.



"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE  1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING   2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE4

Is it Okay?

 

+  BlockNumber  nblocks;
+  doubleblksdone = 0;
Why is it a double even though blksdone is of the same domain
with nblocks? And finally:
+pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+   ++blksdone);
It is converted into int64.



Fixed.
But is it suitable to use BlockNumber instead int64?


Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.



Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.
 


+  WHEN 2 THEN 'analyzing sample'
+  WHEN 3 THEN 'analyzing sample (extended stats)'
I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:
+  WHEN 2 THEN 'computing stats'
+  WHEN 3 THEN 'computing extended stats'
+  WHEN 4 THEN 'analyzing complete'
And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:
+  WHEN 4 THEN 'completing analyze'
+  WHEN 4 THEN 'finalizing analyze'



I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

   WHEN 2 THEN 'computing stats'
   WHEN 3 THEN 'computing extended stats'
   WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.


Agreed. Especially such word choosing is not suitable for me..



To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)


 

+ Process ID of backend.
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+ OID of the database to which this backend is
connected.
+ Name of the database to which this backend is
connected.
"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)



I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)


Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.



Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.

 

+ Whether the current scan includes legacy inheritance
children.
This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..



Hmm... I'm also not sure but I fixed as you suggested.


Yeah, I also am not sure the suggestion is good enough as is..


+   Total number of heap blocks to scan in the current table.
 Number of heap blocks on scanning_table to scan?
It might be better that

Re: progress report for ANALYZE

2019-07-22 Thread Kyotaro Horiguchi

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada 
 wrote in 
<0876b4fe-26fb-ca32-f179-c696fa3dd...@nttcom.co.jp>
> >> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> >> fixed. :)
> >> -
> >
> > I have some comments.
> > +   const int   index[] = {
> > +   PROGRESS_ANALYZE_PHASE,
> > +   PROGRESS_ANALYZE_TOTAL_BLOCKS,
> > +   PROGRESS_ANALYZE_BLOCKS_DONE
> > +   };
> > +   const int64 val[] = {
> > +   PROGRESS_ANALYZE_PHASE_ANALYSIS,
> > +   0, 0
> > Do you oppose to leaving the total/done blocks alone here:p?
> 
> 
> Thanks for your comments!
> I created a new patch based on your comments because Alvaro allowed me
> to work on revising the patch. :)
> 
> 
> Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

> > +  BlockNumber  nblocks;
> > +  doubleblksdone = 0;
> > Why is it a double even though blksdone is of the same domain
> > with nblocks? And finally:
> > +pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
> > +   ++blksdone);
> > It is converted into int64.
> 
> 
> Fixed.
> But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

> > +  WHEN 2 THEN 'analyzing sample'
> > +  WHEN 3 THEN 'analyzing sample (extended stats)'
> > I think we should avoid parenthesized phrases as far as
> > not-necessary. That makes the column unnecessarily long. The
> > phase is internally called as "compute stats" so *I* would prefer
> > something like the followings:
> > +  WHEN 2 THEN 'computing stats'
> > +  WHEN 3 THEN 'computing extended stats'
> > +  WHEN 4 THEN 'analyzing complete'
> > And if you are intending by this that (as mentioned in the
> > documentation) "working to complete this analyze", this would be:
> > +  WHEN 4 THEN 'completing analyze'
> > +  WHEN 4 THEN 'finalizing analyze'
> 
> 
> I have no strong opinion, so I changed the phase-names based on
> your suggestions like following:
> 
>   WHEN 2 THEN 'computing stats'
>   WHEN 3 THEN 'computing extended stats'
>   WHEN 4 THEN 'finalizing analyze'
> 
> However, I'd like to get any comments from hackers to get a consensus
> about the names.

Agreed. Especially such word choosing is not suitable for me..

> > + Process ID of backend.
> > of "the" backend. ? And period is not attached on all
> > descriptions consisting of a single sentence.
> >
> > + OID of the database to which this backend is
> > connected.
> > + Name of the database to which this backend is
> > connected.
> > "database to which .. is connected" is phrased as "database this
> > backend is connected to" in pg_stat_acttivity. (Just for consistency)
> 
> 
> I checked the sentences on other views of progress monitor (VACUUM,
> CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
> I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

> > + Whether the current scan includes legacy inheritance
> > children.
> > This apparently excludes partition tables but actually it is
> > included.
> >
> >"Whether scanning through child tables" ?
> > I'm not sure "child tables" is established as the word to mean
> > both "partition tables and inheritance children"..
> 
> 
> Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..


> > +   The table being scanned (differs from relid
> > +   only when processing partitions or inheritance children).
> > Is  needed? And I think the parentheses are not needed.
> >OID of the table currently being scanned. Can differ from relid
> >when analyzing tables that have child tables.
> 
> 
> How about:
>   OID of the table currently being scanned.
>   It might be different from relid when analyzing tables that have child
>   tables.
> 
> 
> 
> > +   Total number of heap blocks to scan in the current table.
> > Number of heap blocks on scanning_table to scan?
> > It might be better that this description describes that this
> > and the next column is meaningful only while the phase
> > "scanning table".
> 
> 
> How about:
>   Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents

Re: progress report for ANALYZE

2019-07-22 Thread Tatsuro Yamada


Hi Horiguchi-san!


On 2019/07/11 19:56, Kyotaro Horiguchi wrote:

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada  
wrote in <244cb241-168b-d6a9-c45f-a80c34cdc...@nttcom.co.jp>

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas 
wrote:


On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
 wrote:

Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.


I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the
end.

+1
Note that this patch is already behaving like that if the table only
contains dead rows.


+1 from me.


3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
-


I have some comments.

+   const int   index[] = {
+   PROGRESS_ANALYZE_PHASE,
+   PROGRESS_ANALYZE_TOTAL_BLOCKS,
+   PROGRESS_ANALYZE_BLOCKS_DONE
+   };
+   const int64 val[] = {
+   PROGRESS_ANALYZE_PHASE_ANALYSIS,
+   0, 0

Do you oppose to leaving the total/done blocks alone here:p?



Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)


Ah, I revised it to remove "0, 0".




+  BlockNumber  nblocks;
+  doubleblksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+   ++blksdone);

It is converted into int64.



Fixed.
But is it suitable to use BlockNumber instead int64?


 

+  WHEN 2 THEN 'analyzing sample'
+  WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+  WHEN 2 THEN 'computing stats'
+  WHEN 3 THEN 'computing extended stats'



+  WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+  WHEN 4 THEN 'completing analyze'
+  WHEN 4 THEN 'finalizing analyze'



I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

  WHEN 2 THEN 'computing stats'
  WHEN 3 THEN 'computing extended stats'
  WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.




+ Process ID of backend.

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+  OID of the database to which this backend is connected.
+  Name of the database to which this backend is connected.

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)



I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)




+  Whether the current scan includes legacy inheritance 
children.

This apparently excludes partition tables but actually it is
included.

   "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..



Hmm... I'm also not sure but I fixed as you suggested.




+   The table being scanned (differs from relid
+   only when processing partitions or inheritance children).

Is  needed? And I think the parentheses are not needed.

   OID of the table currently being scanned. Can differ from relid
   when analyzing tables that have child tables.



How about:
  OID of the table currently being scanned.
  It might be different from relid when analyzing tables that have child tables.




+   Total number of heap blocks to scan in the current table.

Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".



How about:
  Total number of heap blocks in the scanning_table.





+   The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
scanning_table to obtain samples"
might be better.



Fixed.

Re: progress report for ANALYZE

2019-07-11 Thread Kyotaro Horiguchi

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada 
 wrote in 
<244cb241-168b-d6a9-c45f-a80c34cdc...@nttcom.co.jp>
> Hi Alvaro, Anthony, Julien and Robert,
> 
> On 2019/07/09 3:47, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 8:44 PM Robert Haas 
> > wrote:
> >>
> >> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
> >>  wrote:
> >>> Yeah, I got the impression that that was determined to be the
> >>> desirable
> >>> behavior, so I made it do that, but I'm not really happy about it
> >>> either.  We're not too late to change the CREATE INDEX behavior, but
> >>> let's discuss what is it that we want.
> >>
> >> I don't think I intended to make any such determination -- which
> >> commit do you think established this as the canonical behavior?
> >>
> >> I propose that once a field is set, we should leave it set until the
> >> end.
> > +1
> > Note that this patch is already behaving like that if the table only
> > contains dead rows.

+1 from me.

> I fixed the patch including:
> 
>   - Replace "if" to "else if". (Suggested by Julien)
>   - Fix typo s/ech/each/. (Suggested by Anthony)
>   - Add Phase "analyzing complete" in the pgstat view. (Suggested by
>   - Julien, Robert and me)
> It was overlooked to add it in system_views.sql.
> 
> I share my re-test result, see below:
> 
> -
> [Session #1]
> create table hoge as select * from generate_series(1, 100) a;
> analyze verbose hoge;
> 
> [Session #2]
> \a \t
> select * from pg_stat_progress_analyze; \watch 0.001
> 
> 3785|13599|postgres|16384|f|16384|scanning table|4425|6
> 3785|13599|postgres|16384|f|16384|scanning table|4425|31
> 3785|13599|postgres|16384|f|16384|scanning table|4425|70
> 3785|13599|postgres|16384|f|16384|scanning table|4425|109
> ...
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|scanning table|4425|4425
> 3785|13599|postgres|16384|f|16384|analyzing sample|0|0
> 3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
> fixed. :)
> -

I have some comments.

+   const int   index[] = {
+   PROGRESS_ANALYZE_PHASE,
+   PROGRESS_ANALYZE_TOTAL_BLOCKS,
+   PROGRESS_ANALYZE_BLOCKS_DONE
+   };
+   const int64 val[] = {
+   PROGRESS_ANALYZE_PHASE_ANALYSIS,
+   0, 0

Do you oppose to leaving the total/done blocks alone here:p?

+  BlockNumber  nblocks;
+  doubleblksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+   ++blksdone);

It is converted into int64.

+  WHEN 2 THEN 'analyzing sample'
+  WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+  WHEN 2 THEN 'computing stats'
+  WHEN 3 THEN 'computing extended stats'

+  WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+  WHEN 4 THEN 'completing analyze'
+  WHEN 4 THEN 'finalizing analyze'

+ Process ID of backend.

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+  OID of the database to which this backend is connected.
+  Name of the database to which this backend is connected.

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)

+  Whether the current scan includes legacy inheritance 
children.

This apparently excludes partition tables but actually it is
included.

  "Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

+   The table being scanned (differs from relid
+   only when processing partitions or inheritance children).

Is  needed? And I think the parentheses are not needed.

  OID of the table currently being scanned. Can differ from relid
  when analyzing tables that have child tables.

+   Total number of heap blocks to scan in the current table.

   Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

+   The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item.

Re: progress report for ANALYZE

2019-07-10 Thread Robert Haas

On Wed, Jul 10, 2019 at 9:26 AM Alvaro Herrera  wrote:
> On 2019-Jul-10, Robert Haas wrote:
> > On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera  
> > wrote:
> > > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
> >
> > Why do we do that?
>
> Because we scan the table first, then the index, then the table again
> (last two for the validation phase of CIC).  We count "block numbers"
> separately for each of those, and keep those counters in the same pair
> of columns.  I think we also do that for tuple counters in one case.

Hmm.  I think I would have been inclined to use different counter
numbers for table blocks and index blocks, but perhaps we don't have
room. Anyway, leaving them set until we need them again seems like the
best we can do as things stand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

2019-07-10 Thread Alvaro Herrera

On 2019-Jul-10, Robert Haas wrote:

> On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera  
> wrote:
> > Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.
> 
> Why do we do that?

Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC).  We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns.  I think we also do that for tuple counters in one case.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-07-10 Thread Robert Haas

On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera  wrote:
> Hmm, ok.  In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

2019-07-09 Thread Alvaro Herrera

On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  
> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
> 
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?

No commit, just discussion in the CREATE INDEX thread.

> I propose that once a field is set, we should leave it set until the end.

Hmm, ok.  In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-07-09 Thread Tatsuro Yamada


Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas  wrote:


On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.


I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.


+1

Note that this patch is already behaving like that if the table only
contains dead rows.



I fixed the patch including:

  - Replace "if" to "else if". (Suggested by Julien)
  - Fix typo s/ech/each/. (Suggested by Anthony)
  - Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, 
Robert and me)
It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

-
[Session #1]
create table hoge as select * from generate_series(1, 100) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
-

Thanks,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..c368444 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  
 
  
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
+ 
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
CLUSTER or VACUUM FULL, showing 
current progress.
@@ -3927,6 +3935,134 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether the current scan includes legacy inheritance 
children.
+ 
+ 
+  scanning_table
+  oid
+  
+   The table being scanned (differs from relid
+   only when processing partitions or inheritance children).
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks to scan in the current table.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ scanning table
+ 
+   The command is currently scanning the table.
+ 
+
+
+ analyzing sample
+ 
+   ANALYZE is currently extracting statistical data
+   from the sample obtained.
+ 
+
+
+ analyzing sample (extended)
+ 
+   ANALYZE is currently extracting statistical data
+   for extended statistics from the sample obtained.
+ 
+
+
+ analyzing complete
+ 
+   The command is updating pg_class. When this phase is completed, 
+   ANALYZE will end.
+ 
+
+   
+   
+  
+ 
+
  
   CLUSTER Progress Reporting
 
diff --git

Re: progress report for ANALYZE

2019-07-08 Thread Julien Rouhaud

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas  wrote:
>
> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  
> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
>
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?
>
> I propose that once a field is set, we should leave it set until the end.

+1

Note that this patch is already behaving like that if the table only
contains dead rows.

Re: progress report for ANALYZE

2019-07-08 Thread Robert Haas

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  wrote:
> Yeah, I got the impression that that was determined to be the desirable
> behavior, so I made it do that, but I'm not really happy about it
> either.  We're not too late to change the CREATE INDEX behavior, but
> let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

2019-07-08 Thread Alvaro Herrera

On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
>  wrote:
> > 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> > 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> > 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
> 
> Why do we zero out the block numbers when we switch phases?  The
> CREATE INDEX progress reporting patch does that kind of thing too, and
> it seems like poor behavior to me.

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: progress report for ANALYZE

2019-07-08 Thread Robert Haas

On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
 wrote:
> 17520|13599|postgres|16387|f|16387|scanning table|4425|4425
> 17520|13599|postgres|16387|f|16387|analyzing sample|0|0
> 17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??

Why do we zero out the block numbers when we switch phases?  The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: progress report for ANALYZE

2019-07-08 Thread Tatsuro Yamada


Hi Alvaro,


I'll review your patch in this week. :)


I tested your patch on 6b854896.
Here is a result. See below:

-
[Session #1]
create table hoge as select * from generate_series(1, 100) a;
analyze verbose hoge;

[Session #2]
\a
\t
select * from pg_stat_progress_analyze; \watch 0.001

17520|13599|postgres|16387|f|16387|scanning table|4425|14
17520|13599|postgres|16387|f|16387|scanning table|4425|64
17520|13599|postgres|16387|f|16387|scanning table|4425|111
...
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0  <-- Is it Okay??
-

I have a question of the last line of the result.
I'm not sure it is able or not, but I guess it would be better
to keep the phase name (analyzing sample) on the view until the
end of this command. :)

Regards,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-07-02 Thread Tatsuro Yamada


Hi Alvaro!

On 2019/06/22 3:52, Alvaro Herrera wrote:

Hello

Here's a patch that implements progress reporting for ANALYZE.


Sorry for the late reply.
My email address was changed to tatsuro.yamada...@nttcom.co.jp.

I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.

I'll review your patch in this week. :)
 
[1] ANALYZE command progress checker

https://www.postgresql.org/message-id/flat/968b4eda-2417-8b7b-d468-71643cf088b6%40openscg.com#574488592fcc9708c38fa44b0dae9006

Regards,
Tatsuro Yamada

Re: progress report for ANALYZE

2019-07-02 Thread Julien Rouhaud

On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera  wrote:
>
> Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly.  I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
if (numrows > 0)
{
MemoryContext col_context,
-   old_context;
+ old_context;
+   const int   index[] = {
+   PROGRESS_ANALYZE_PHASE,
+   PROGRESS_ANALYZE_TOTAL_BLOCKS,
+   PROGRESS_ANALYZE_BLOCKS_DONE
+   };
+   const int64 val[] = {
+   PROGRESS_ANALYZE_PHASE_ANALYSIS,
+   0, 0
+   };
+
+   pgstat_progress_update_multi_param(3, index, val);
[...]
}
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
/* Translate command name into command type code. */
if (pg_strcasecmp(cmd, "VACUUM") == 0)
cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+   cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.

62 matches

Mail list logo