Following on previous email.... I have added below some use cases which I find very relevant when we need to know the progress of a SQL query. The command can be used by any SQL query (select, update, delete, insert).
The tables used have been created with : CREATE TABLE T_1M (id integer, md5 text); INSERT INTO T_1M SELECT generate_series(1,1000000) AS id, md5(random()::text) AS md5; CREATE TABLE T_10M ( id integer, md5 text); INSERT INTO T_10M SELECT generate_series(1,10000000) AS id, md5(random()::text) AS md5; All the different leaf node types are implemented. 1/ Parallel select with sort (no index) =========================== => Terminal running the long SQL query: test=# select * from t_10M order by md5; => Terminal monitoring SQL query progression: test=# select pid,query from pg_stat_activity ; pid | query -------+------------------------------------------ 8062 | 8065 | 19605 | select pid,query from pg_stat_activity ; 20830 | select * from t_10M order by md5; 20832 | select * from t_10M order by md5; 20833 | select * from t_10M order by md5; 8060 | 8059 | 8061 | (9 rows) test=# PROGRESS 20830 test-# ; PLAN PROGRESS ------------------------------------------------------------------------ Gather Merge -> Sort=> dumping tuples to tapes / merging tapes rows r/w merge 2167923/2167908 rows r/w effective 0/3514320 0% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# test=# test=# PROGRESS 20830; PLAN PROGRESS ---------------------------------------------------------------------------- Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 4707198/4691167 rows r/w effective 16016/3514320 0% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ----------------------------------------------------------------------------- Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 4809857/4691167 rows r/w effective 118675/3514320 3% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ----------------------------------------------------------------------------- Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 4883715/4691167 rows r/w effective 192533/3514320 5% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ----------------------------------------------------------------------------- Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 4948381/4691167 rows r/w effective 257199/3514320 7% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ----------------------------------------------------------------------------- Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 5022137/4691167 rows r/w effective 330955/3514320 9% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------------------------------------------------------------ Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 5079083/4691167 rows r/w effective 387901/3514320 11% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------------------------------------------------------------ Gather Merge -> Sort=> final merge sort on tapes rows r/w merge 5144499/4691167 rows r/w effective 453317/3514320 12% Sort Key: md5 -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84% (5 rows) test=# PROGRESS 20830; PLAN PROGRESS ---------------------- <out of transaction> (1 row) test=# SQL query was interrupted before completion 2/ Insert into table ============= => Terminal running the long SQL query: test=# INSERT INTO T_10M SELECT generate_series(10000001, 12000000) AS id, md5(random()::text) AS md5; => Terminal monitoring SQL query progression: test=# PROGRESS 20830; PLAN PROGRESS ---------------------- <out of transaction> (1 row) test=# test=# PROGRESS 20830; PLAN PROGRESS ----------------------- Insert => rows 718161 -> ProjectSet -> Result (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------ Insert => rows 1370255 -> ProjectSet -> Result (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------ Insert => rows 1916731 -> ProjectSet -> Result (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ---------------- <idle backend> (1 row) test=# 3/ Delete with like clause =================== => Terminal running the long SQL query: test=# DELETE FROM T_10M WHERE md5 like '%cb%'; => Terminal monitoring SQL query progression: test=# PROGRESS 20830; PLAN PROGRESS ---------------- <idle backend> (1 row) test=# PROGRESS 20830; PLAN PROGRESS ---------------------------------------------------------------------- Delete => rows 91906 -> Seq Scan on t_10m => rows 91906/848485 10% blks 6668/100000 6% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------------------------------------------------------- Delete => rows 151900 -> Seq Scan on t_10m => rows 151900/848485 17% blks 11019/100000 11% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------------------------------------------------------- Delete => rows 309533 -> Seq Scan on t_10m => rows 309533/848485 36% blks 22471/100000 22% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ------------------------------------------------------------------------- Delete => rows 705968 -> Seq Scan on t_10m => rows 705968/848485 83% blks 51274/100000 51% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS -------------------------------------------------------------------------- Delete => rows 913843 -> Seq Scan on t_10m => rows 913843/848485 107% blks 66417/100000 66% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS --------------------------------------------------------------------------- Delete => rows 1113104 -> Seq Scan on t_10m => rows 1113104/848485 131% blks 80881/100000 80% Filter: (md5 ~~ '%cb%'::text) (3 rows) test=# PROGRESS 20830; PLAN PROGRESS ---------------- <idle backend> (1 row) test=# Above monitoring report shows: - The Seq Scan node with the number of rows scanned, the number of blocks scanned/read. - The Delete node with the number of rows deleted. 4/ Select with offset and limit clause =========================== => Terminal running the long SQL query: test=#select * from t_10M order by md5 offset 80 limit 10; => Terminal monitoring SQL query progression: test=# \watch PROGRESS 20830; Wed 19 Apr 2017 04:36:16 PM CEST (every 1s) PLAN PROGRESS ---------------- <idle backend> (1 row) Wed 19 Apr 2017 04:36:17 PM CEST (every 1s) PLAN PROGRESS ---------------- <idle backend> (1 row) Wed 19 Apr 2017 04:36:18 PM CEST (every 1s) PLAN PROGRESS ------------------------------------------------------------------------------ Limit => offset 0% limit 0% -> Sort=> loading tuples in memory 90 Sort Key: md5 -> Seq Scan on t_10m => rows 174392/11586584 1% blks 1640/100000 1% (4 rows) ... PLAN PROGRESS ---------------------------------------------------------------------------------- Limit => offset 0% limit 0% -> Sort=> loading tuples in memory 90 Sort Key: md5 -> Seq Scan on t_10m => rows 1656828/11586584 14% blks 15600/100000 15% (4 rows) ... PLAN PROGRESS ---------------------------------------------------------------------------------- Limit => offset 0% limit 0% -> Sort=> loading tuples in memory 90 Sort Key: md5 -> Seq Scan on t_10m => rows 4954207/11586584 42% blks 46640/100000 46% (4 rows) Wed 19 Apr 2017 04:36:35 PM CEST (every 1s) ... PLAN PROGRESS ---------------------------------------------------------------------------------- Limit => offset 0% limit 0% -> Sort=> loading tuples in memory 90 Sort Key: md5 -> Seq Scan on t_10m => rows 7837687/11586584 67% blks 73772/100000 73% (4 rows) Wed 19 Apr 2017 04:36:41 PM CEST (every 1s) ... PLAN PROGRESS ----------------------------------------------------------------------------------- Limit => offset 0% limit 0% -> Sort=> loading tuples in memory 90 Sort Key: md5 -> Seq Scan on t_10m => rows 10378786/11586584 89% blks 97690/100000 97% (4 rows) Wed 19 Apr 2017 04:36:49 PM CEST (every 1s) PLAN PROGRESS ---------------- <idle backend> (1 row) 5/ Sample scan ============= => Terminal running the long SQL query: # select * from t_10m tablesample system(50); => Terminal monitoring SQL query progression: PLAN PROGRESS ---------------- <idle backend> (1 row) Wed 19 Apr 2017 04:44:12 PM CEST (every 1s) PLAN PROGRESS ----------------------------------------------------------------------- Sample Scan on t_10m => rows 783274/5793292 13% blks 14616/100000 14% Sampling: system ('50'::real) (2 rows) Wed 19 Apr 2017 04:44:13 PM CEST (every 1s) PLAN PROGRESS ------------------------------------------------------------------------ Sample Scan on t_10m => rows 2514675/5793292 43% blks 47076/100000 47% Sampling: system ('50'::real) (2 rows) Wed 19 Apr 2017 04:44:14 PM CEST (every 1s) PLAN PROGRESS ------------------------------------------------------------------------ Sample Scan on t_10m => rows 4031400/5793292 69% blks 75625/100000 75% Sampling: system ('50'::real) (2 rows) Wed 19 Apr 2017 04:44:15 PM CEST (every 1s) PLAN PROGRESS ---------------- <idle backend> (1 row) 2017-04-19 16:13 GMT+02:00 Remi Colinet <remi.coli...@gmail.com>: > Maksim, > > > 2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyu...@postgrespro.ru>: > >> On 18.04.2017 17:39, Remi Colinet wrote: >> >>> Hello Maksim, >>> >>> The core implementation I suggested for the new PROGRESS command uses >>> different functions from the one used by EXPLAIN for its core >>> implementation. >>> Some source code is shared with EXPLAIN command. But this shared code is >>> only related to quals, properties, children, subPlans and few other >>> nodes. >>> >>> All other code for PROGRESS is new code. >>> >>> I don't believe explain.c code can be fully shared with the one of the >>> new PROGRESS command. These 2 commands have different purposes. >>> The core code used for the new PROGRESS command is very different from >>> the core code used for EXPLAIN. >>> >>> >> Perhaps you will be forced to duplicate significant snippets of >> functionality from explain.c into your progress.c. >> > > Currently, few code is duplicated between EXPLAIN and PROGRESS commands. > The duplicated code could be moved to file src/backend/commands/report.c > which is used to gather shared code between the 2 commands. I will try to > complete this code sharing as much as possible. > > The main point is that PROGRESS uses the same design pattern as EXPLAIN by > parsing the query tree. The work horse of the PROGRESS command is > ProgressNode() which calls recursively sub nodes until we reach leaf nodes > such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN > command uses a similar work horse with function ExplainNode() which > eventually calls different leaf nodes. > > Some of the leaf nodes which are common to the 2 commands have been put in > the file src/backend/commands/report.c. May be some further code sharing is > also possible for the work horse by using a template function which would > call EXPLAIN specific leaf node functions or PROGRESS specific leaf node > functions. > > >> >> >>> Regarding the queryDesc state of SQL query upon receiving a request to >>> report its execution progress, it does not bring any issue. The request >>> is noted when the signal is received by the monitored backend. Then, the >>> backend continues its execution code path. When interrupts are checked >>> in the executor code, the request will be dealt. >>> >>> >> Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries. >> >> When the request is being dealt, the monitored backend will stop its >>> execution and report the progress of the SQL query. Whatever is the >>> status of the SQL query, progress.c code checks the status and report >>> either that the SQL query does not have a valid status, or otherwise the >>> current execution state of the SQL query. >>> >>> SQL query status checking is about: >>> - idle transaction >>> - out of transaction status >>> - null planned statement >>> - utility command >>> - self monitoring >>> >>> Other tests can be added if needed to exclude some SQL query state. Such >>> checking is done in void HandleProgressRequest(void). >>> I do not see why a SQL query progression would not be possible in this >>> context. Even when the queryDescc is NULL, we can just report a <idle >>> transaction> output. This is currently the case with the patch suggested. >>> >>> >> It's interesting question - how much the active query is in a usable >> state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS >> doesn't give us 100% guarantee about full consistency [1]. >> > > I wonder what you mean about usable state. > > Currently, the code suggested tests the queryDesc pointer and all the sub > nodes pointers in order to detect NULL pointers. When the progress report > is collected by the backend, this backend does the collect and consequently > does not run the query. So the query tree is not being modified. At this > moment, whatever is the query state, we can manage to deal with its static > state. It is only a tree which could also be just a NULL pointer in the > most extreme case. Such case is dealt in the current code. > > >> >> So far, I've found this new command very handy. It allows to evaluate >>> the time needed to complete a SQL query. >>> >>> >> Could you explain how you get the percent of execution for nodes of plan >> tree and overall for query? >> > > The progress of execution of the query is computed as follows at 2 > different places for each leaf node type (Scan, IndexScan, Sort, Material, > TupleStore, ...): > > - one place in the executor code, or in access methods code, or in sort > utilities code, used during the execution of the SQL query in which > following values are counted for instance: rows R/W, blocks, R/W, tapes R/W > used for sort, tuple store R/W, ... . Some of these values are already > computed in the current Postgresql official source code. Some other values > had to be added and collected. > > - one place in the leaf function of each node type (ProgressScan(), > ProgressSort(), ...) in which percents are computed and are then dumped > together with raw values collected during execution, in the report. The > dump details can be selected with the VERBOSE option of the PROGRESS > command (For instance # PROGRESS VERBOSE $pid) > > For instance: > > 1/ read/write rows are collected when running the executor in the file > src/backend/executor/execProcnode.c > ============================================================ > ================== > > This is already implemented in the current official source tree. Nothing > mode needs to be implemented to collect values (total rows number and > current rows number already fetched are collected). > The report is implemented in leaf function ProgressScanRows(). > > 2/ read/write blocks are collected in the file src/backend/access/heap/ > heapam.c > ========================================================== > > This is already implemented in the current official source tree. Nothing > more needs to be implemented to collect values during execution. > The report is implemented in a leaf function ProgressScanBlks(). > > 3/ the sort progressions are collected in the file src/backend/utils/sort/ > tuplesort.c > ========================================================== > > [root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c > src/backend/utils/sort/tuplesort.c | 142 ++++++++++++++++++++++++++++++ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++------------- > 1 file changed, 127 insertions(+), 15 deletions(-) > [root@rco pg]# > > New fields have been added to compute the different I/O > (read/write/merge/...) per tapes for instance, during a sort on tape. > The report of Sort node is implemented in leaf function ProgressSort() > > 4/ the tuple store progressions are computed in the file > src/backend/utils/sort/tuplestore.c > ================================================================= > > [root@rco pg]# git diff --stat master.. src/backend/utils/sort/ > tuplestore.c > src/backend/utils/sort/tuplestore.c | 73 ++++++++++++++++++++++++++++++ > ++++++++++++++++++++++++++----------------- > 1 file changed, 56 insertions(+), 17 deletions(-) > [root@rco pg]# > > New fields have been added to collect the I/O needed for such tuple store. > The report of TupleStore node is implemented in leaf function > ProgressTupleStore(). > > > Other node types have been implemented: TidScan, IndexScan, LimitScan, > CustomScan, Hash, ModifyTable. > Such node may require some new fields to collect values during the SQL > query execution. > > Overall, the overhead caused by new values collected during the SQL query > execution, is very low. > A few values need to be collected. > > >> A further improvement would be to report the memory, disk and time >>> resources used by the monitored backend. An overuse of memory, disk and >>> time resources can prevent the SQL query to complete. >>> >>> >> This functionality is somehow implemented in explain.c. You can see my >> patch to this file [2]. I only manipulate runtime statistics (data in the >> structure 'Instrumentation') to achieve the printing state of running query. >> >> I will check your patch and try to add such feature to the current patch. > It provides a valuable hint to estimate whether a SQL query has a chance > to complete and will not reached the resource limits. > . > >> >> 1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us >> 2. https://github.com/postgrespro/pg_query_state/blob/master/ru >> ntime_explain.patch >> >> Thanks for your suggestions and comments > > Regards > Remi > >> >> -- >> Maksim Milyutin >> Postgres Professional: http://www.postgrespro.com >> Russian Postgres Company >> > >