On 2017/03/17 10:38, Robert Haas wrote:
On Fri, Mar 10, 2017 at 2:46 AM, vinayak
<pokale_vinayak...@lab.ntt.co.jp> wrote:
Thank you for reviewing the patch.

The attached patch incorporated Michael and Amit comments also.
I reviewed this tonight.
Thank you for reviewing the patch.
+        /* Report compute index stats phase */
+        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,

Hmm, is there really any point to this?  And is it even accurate?  It
doesn't look to me like we are computing any index statistics here;
we're only allocating a few in-memory data structures here, I think,
which is not a statistics computation and probably doesn't take any
significant time.

+        /* Report compute heap stats phase */
+        pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                    PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The phase you label as computing heap statistics also includes the
computation of index statistics.  I wouldn't bother separating the
computation of heap and index statistics; I'd just have a "compute
statistics" phase that begins right after the comment that starts
"Compute the statistics."
Understood. Fixed in the attached patch.

+    /* Report that we are now doing index cleanup */
+    pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

OK, this doesn't make any sense either.  We are not cleaning up the
indexes here.  We are just closing them and releasing resources.  I
don't see why you need this phase at all; it can't last more than some
tiny fraction of a second.  It seems like you're trying to copy the
exact same phases that exist for vacuum instead of thinking about what
makes sense for ANALYZE.
Understood. I have removed this phase.

+        /* Report number of heap blocks scaned so far*/
+        pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,

While I don't think this is necessarily a bad thing to report, I find
it very surprising that you're not reporting what seems to me like the
most useful thing here - namely, the number of blocks that will be
sampled in total and the number of those that you've selected.
Instead, you're just reporting the length of the relation and the
last-sampled block, which is a less-accurate guide to total progress.
There are enough counters to report both things, so maybe we should.

+    /* Report total number of sample rows*/
+    pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);

As an alternative to the previous paragraph, yet another thing you
could report is number of rows sampled out of total rows to be
sampled.  But this isn't the way to do it: you are reporting the
number of rows you sampled only once you've finished sampling.  The
point of progress reporting is to report progress -- that is, to
report this value as it's updated, not just to report it when ANALYZE
is practically done and the value has reached its maximum.
I have reported number of rows sampled out of total rows to be sampled.
I have not reported the number of blocks that will be sampled in total.
So currect pg_stat_progress_analyze view looks like:

=# \d+ pg_stat_progress_analyze
                         View "pg_catalog.pg_stat_progress_analyze"
Column | Type | Collation | Nullable | Default | Storage | Descripti
pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | num_target_sample_rows | bigint | | | | plain | num_rows_sampled | bigint | | | | plain |
View definition:
 SELECT s.pid,
        CASE s.param1
            WHEN 0 THEN 'initializing'::text
            WHEN 1 THEN 'collecting sample rows'::text
            WHEN 2 THEN 'computing statistics'::text
            ELSE NULL::text
        END AS phase,
    s.param2 AS num_target_sample_rows,
    s.param3 AS num_rows_sampled
FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, p
aram3, param4, param5, param6, param7, param8, param9, param10)
     LEFT JOIN pg_database d ON s.datid = d.oid;

The documentation for this patch isn't very good, either.  You forgot
to update the part of the documentation that says that VACUUM is the
only command that currently supports progress reporting, and the
descriptions of the phases and progress counters are brief and not
entirely accurate - e.g. heap_blks_scanned is not actually the number
of heap blocks scanned, because we are sampling, not reading all the
Understood. I have updated the documentation.
I will also try to improve documentation.
When adding calls to the progress reporting functions, please consider
whether a blank line should be added before or after the new code, or
both, or neither.  I'd say some blank lines are needed in a few places
where you didn't add them.
+1. I have added blank lines in a few places.

Vinayak Pokale

Attachment: pg_stat_progress_analyze_v3.patch
Description: binary/octet-stream

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to