On 2017/03/17 10:38, Robert Haas wrote:
On Fri, Mar 10, 2017 at 2:46 AM, vinayak <pokale_vinayak...@lab.ntt.co.jp> wrote:Thank you for reviewing the patch.The attached patch incorporated Michael and Amit comments also.I reviewed this tonight.
Thank you for reviewing the patch.
+ /* Report compute index stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); Hmm, is there really any point to this? And is it even accurate? It doesn't look to me like we are computing any index statistics here; we're only allocating a few in-memory data structures here, I think, which is not a statistics computation and probably doesn't take any significant time. + /* Report compute heap stats phase */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); The phase you label as computing heap statistics also includes the computation of index statistics. I wouldn't bother separating the computation of heap and index statistics; I'd just have a "compute statistics" phase that begins right after the comment that starts "Compute the statistics."
Understood. Fixed in the attached patch.
+ /* Report that we are now doing index cleanup */ + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); OK, this doesn't make any sense either. We are not cleaning up the indexes here. We are just closing them and releasing resources. I don't see why you need this phase at all; it can't last more than some tiny fraction of a second. It seems like you're trying to copy the exact same phases that exist for vacuum instead of thinking about what makes sense for ANALYZE.
Understood. I have removed this phase.
+ /* Report number of heap blocks scaned so far*/ + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, targblock); While I don't think this is necessarily a bad thing to report, I find it very surprising that you're not reporting what seems to me like the most useful thing here - namely, the number of blocks that will be sampled in total and the number of those that you've selected. Instead, you're just reporting the length of the relation and the last-sampled block, which is a less-accurate guide to total progress. There are enough counters to report both things, so maybe we should. + /* Report total number of sample rows*/ + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows); As an alternative to the previous paragraph, yet another thing you could report is number of rows sampled out of total rows to be sampled. But this isn't the way to do it: you are reporting the number of rows you sampled only once you've finished sampling. The point of progress reporting is to report progress -- that is, to report this value as it's updated, not just to report it when ANALYZE is practically done and the value has reached its maximum.
Understood. I have reported number of rows sampled out of total rows to be sampled. I have not reported the number of blocks that will be sampled in total. So currect pg_stat_progress_analyze view looks like: =# \d+ pg_stat_progress_analyze View "pg_catalog.pg_stat_progress_analyze"Column | Type | Collation | Nullable | Default | Storage | Descripti
on ------------------------+---------+-----------+----------+---------+----------+---------- ---pid | integer | | | | plain | datid | oid | | | | plain | datname | name | | | | plain | relid | oid | | | | plain | phase | text | | | | extended | num_target_sample_rows | bigint | | | | plain | num_rows_sampled | bigint | | | | plain |
View definition: SELECT s.pid, s.datid, d.datname, s.relid, CASE s.param1 WHEN 0 THEN 'initializing'::text WHEN 1 THEN 'collecting sample rows'::text WHEN 2 THEN 'computing statistics'::text ELSE NULL::text END AS phase, s.param2 AS num_target_sample_rows, s.param3 AS num_rows_sampledFROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, p
aram3, param4, param5, param6, param7, param8, param9, param10) LEFT JOIN pg_database d ON s.datid = d.oid; Comment?
The documentation for this patch isn't very good, either. You forgot to update the part of the documentation that says that VACUUM is the only command that currently supports progress reporting, and the descriptions of the phases and progress counters are brief and not entirely accurate - e.g. heap_blks_scanned is not actually the number of heap blocks scanned, because we are sampling, not reading all the blocks.
Understood. I have updated the documentation. I will also try to improve documentation.
When adding calls to the progress reporting functions, please consider whether a blank line should be added before or after the new code, or both, or neither. I'd say some blank lines are needed in a few places where you didn't add them.
+1. I have added blank lines in a few places. Regards, Vinayak Pokale
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers