Re: [HACKERS] CLUSTER command progress monitor
On 2017/09/12 21:20, Tatsuro Yamada wrote: On 2017/09/11 23:38, Robert Haas wrote: On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada wrote: Thanks for the comment. As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by cost estimation. In the case of SEQ SCAN, these two phases not overlap. However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan heap and write new heap" when INDEX SCAN was selected. I agree that progress reporting for sort is difficult. So it only reports the phase ("sorting tuples") in the current design of progress monitor of cluster. It doesn't report counter of sort. Doesn't that make it almost useless? I would guess that scanning the heap and writing the new heap would ordinarily account for most of the runtime, or at least enough that you're going to want something more than just knowing that's the phase you're in. Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) I know that external merge sort takes a time than quick sort. I'll try investigating how to get a counter from external merge sort processing. Is this the right way? The patch is getting the value reported as heap_tuples_total from OldHeap->rd_rel->reltuples. I think this is pointless: the user can see that value anyway if they wish. The point of the progress counters is to expose things the user couldn't otherwise see. It's also not necessarily accurate: it's only an estimate in the best case, and may be way off if the relation has recently be extended by a large amount. I think it's pretty important that we try hard to only report values that are known to be accurate, because users hate (and mock) inaccurate progress reports. Do you mean to use the number of rows by using below calculation instead OldHeap->rd_rel->reltuples? estimate rows = physical table size / average row length No, I mean don't report it at all. The caller can do that calculation if they wish, without any help from the progress reporting machinery. I see. I'll remove that column on next patch. I will summarize the current design and future corrections before sending the next patch. === Current design === CLUSTER command may use Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 1. scanning heap(*1) 2. sorting tuples (*2) 3. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 4. scan heap and write new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. scanning heap(*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name| | | relid | oid | | | command | text| | | phase | text| | | scan_method | text| | | scan_index_relid | bigint | | | heap_tuples_total | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | heap_tuples_recently_dead | bigint | | | === It will be changed on next patch === - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster - Remove heap_tuples_total column from the view - Add a progress counter in the phase of "sorting tuples" (difficult?!) === My test case as a bonus === I share my test case of progress monitor. If someone wants to watch the current progress monitor, you can use this test case as a example. [Terminal1] Run this query on psql: select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, ) a; create index idx_t1 on
Re: [HACKERS] CLUSTER command progress monitor
On 2017/09/11 23:38, Robert Haas wrote: On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada wrote: Thanks for the comment. As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by cost estimation. In the case of SEQ SCAN, these two phases not overlap. However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan heap and write new heap" when INDEX SCAN was selected. I agree that progress reporting for sort is difficult. So it only reports the phase ("sorting tuples") in the current design of progress monitor of cluster. It doesn't report counter of sort. Doesn't that make it almost useless? I would guess that scanning the heap and writing the new heap would ordinarily account for most of the runtime, or at least enough that you're going to want something more than just knowing that's the phase you're in. Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) I know that external merge sort takes a time than quick sort. I'll try investigating how to get a counter from external merge sort processing. Is this the right way? The patch is getting the value reported as heap_tuples_total from OldHeap->rd_rel->reltuples. I think this is pointless: the user can see that value anyway if they wish. The point of the progress counters is to expose things the user couldn't otherwise see. It's also not necessarily accurate: it's only an estimate in the best case, and may be way off if the relation has recently be extended by a large amount. I think it's pretty important that we try hard to only report values that are known to be accurate, because users hate (and mock) inaccurate progress reports. Do you mean to use the number of rows by using below calculation instead OldHeap->rd_rel->reltuples? estimate rows = physical table size / average row length No, I mean don't report it at all. The caller can do that calculation if they wish, without any help from the progress reporting machinery. I see. I'll remove that column on next patch. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
On 2017/09/08 18:55, Robert Haas wrote: On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada wrote: 1. scanning heap 2. sort tuples These two phases overlap, though. I believe progress reporting for sorts is really hard. In the simple case where the data fits in work_mem, none of the work of the sort gets done until all the data is read. Once you switch to an external sort, you're writing batch files, so a lot of the work is now being done during data loading. But as the number of batch files grows, the final merge at the end becomes an increasingly noticeable part of the cost, and eventually you end up needing multiple merge passes. I think we need some smart way to report on sorts so that we can tell how much of the work has really been done, but I don't know how to do it. Thanks for the comment. As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by cost estimation. In the case of SEQ SCAN, these two phases not overlap. However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan heap and write new heap" when INDEX SCAN was selected. I agree that progress reporting for sort is difficult. So it only reports the phase ("sorting tuples") in the current design of progress monitor of cluster. It doesn't report counter of sort. heap_tuples_total | bigint | | | The patch is getting the value reported as heap_tuples_total from OldHeap->rd_rel->reltuples. I think this is pointless: the user can see that value anyway if they wish. The point of the progress counters is to expose things the user couldn't otherwise see. It's also not necessarily accurate: it's only an estimate in the best case, and may be way off if the relation has recently be extended by a large amount. I think it's pretty important that we try hard to only report values that are known to be accurate, because users hate (and mock) inaccurate progress reports. Do you mean to use the number of rows by using below calculation instead OldHeap->rd_rel->reltuples? estimate rows = physical table size / average row length I understand that OldHeap->rd_rel->reltuples is sometimes useless because it is correct by auto analyze and it can't perform when under a threshold. I'll add it in next patch and also share more detailed the current design of progress monitor for cluster. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
On 2017/09/06 16:11, Michael Paquier wrote: On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada wrote: I revised the patch like this: You should avoid top-posting. I see. I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure whether the new name (pg_stat_progress_reorg) is suitable or not. Here are some ideas: rewrite (incorrect for ALTER TABLE), organize (somewhat fine), order, operate (too general?), bundle, reform, assemble. Thanks for sharing your ideas. I searched the words like a "reform table", "reassemble table" and "reorganize table" on google. I realized "reorganaize table" is good choice than others because many DBMS uses this keyword. Therefore, I'll change the name to it like this: before pg_stat_progress_cluster after pg_stat_progress_reorg (I abbreviate reorganize to reorg.) Does anyone have any suggestions? I'll revise the patch. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/05 10:17, Masahiko Sawada wrote: On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas wrote: On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote wrote: Hmm, you're right. It could be counted with a separate variable initialized to 0 and incremented every time we decide to add a row to the final set of sampled rows, although different implementations of AcquireSampleRowsFunc have different ways of deciding if a given row will be part of the final set of sampled rows. On the other hand, if we decide to count progress in terms of blocks as you suggested afraid, I'm afraid that FDWs won't be able to report the progress. I think it may be time to push this patch out to v11. It was submitted one day before the start of the last CommitFest, the design wasn't really right, and it's not clear even now that we know what the right design is. And we're pretty much out of time. +1 We're encountering the design issue and it takes more time to find out right design including FDWs support. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Hi Vinayak, I found a typo in your patch: s/taht/that/ sampling.c /* Report total number of blocks taht will be sampled */ Then, regarding FDWs support, I believe it means postgres_fdw support (is my understanding right?). You might better to put pgstat_progress_update_param() into these functions, maybe. postgres_fdw.c - postgresAnalyzeForeignTable - postgresAcquireSampleRowsFunc And PgFdwAnalyzeState has these variables, hopefully you can get current number of rows as a progress indicator. sturuct PgFdwAnalyzeState ... /* collected sample rows */ HeapTuple *rows; /* array of size targrows */ int targrows; /* target # of sample rows */ int numrows;/* # of sample rows collected */ /* for random sampling */ double samplerows; /* # of rows fetched */ double rowstoskip; /* # of rows to skip before next sample */ ... I hope it will help you. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor code improvement to postgresGetForeignPlan
On 2017/09/07 6:52, Tom Lane wrote: Tatsuro Yamada writes: The declaration of postgresGetForeignPlan uses baserel, but the actual definition uses foreignrel. It would be better to sync. Pushed, thanks. regards, tom lane Thanks! Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
Hi Hackers, I revised the patch like this: - Add "command" column in the view It tells that the running command is CLUSTER or VACUUM FULL. - Enable VACUUM FULL progress monitor Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the view. Sequence of phases are below: 1. scanning heap 5. swapping relation files 6. rebuild index 7. performing final cleanup I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure whether the new name (pg_stat_progress_reorg) is suitable or not. Any comments or suggestion are welcome. Thanks, Tatsuro Yamada On 2017/09/04 20:17, Tatsuro Yamada wrote: On 2017/09/04 15:38, Michael Paquier wrote: On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada wrote: Then I have questions. * Should we have separate views for them? Or should both be covered by the same view with some indication of which command (CLUSTER or VACUUM FULL) is actually running? Using the same view for both, and tell that this is rather VACUUM or CLUSTER in the view, would be better IMO. Coming up with a name more generic than pg_stat_progress_cluster may be better though if this speaks with VACUUM FULL as well, user-facing documentation does not say that VACUUM FULL is actually CLUSTER. Thanks for sharing your thoughts. Agreed. I'll add new column like a "command" to tell whether running CLUSTER or VACUUM. And how about this new view name? - pg_stat_progress_reorg Is it more general name than previous name if it covers both commands? I'll add this patch to CF2017-09. Any comments or suggestion are welcome. Nice to see that you are taking the time to implement patches for upstream, Yamada-san! Same here. :) I'd like to contribute creating feature that is for DBA and users. Progress monitoring feature is important from my DBA experiences. I'm happy if you lend your hand. Thanks, Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 38bf636..33cedc0 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3233,9 +3241,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. + This may be expanded in the future. @@ -3247,9 +3255,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command. See . @@ -3427,6 +3434,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + Current processing command: CLUSTER/VACUUM FULL. + + + + phase + text + + Current processing phase of cluster/vacuum full. See or . + + + + scan_method + text + + Scan method of table: index scan/seq scan. + + + + scan_index_relid + bigint + + OID of the index. + + + + heap_tuples_total + bigint + + Total number of heap tuples in the table. This number is reported + as of the beginning of the scan; tuple
Re: [HACKERS] CLUSTER command progress monitor
On 2017/09/04 15:38, Michael Paquier wrote: On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada wrote: Then I have questions. * Should we have separate views for them? Or should both be covered by the same view with some indication of which command (CLUSTER or VACUUM FULL) is actually running? Using the same view for both, and tell that this is rather VACUUM or CLUSTER in the view, would be better IMO. Coming up with a name more generic than pg_stat_progress_cluster may be better though if this speaks with VACUUM FULL as well, user-facing documentation does not say that VACUUM FULL is actually CLUSTER. Thanks for sharing your thoughts. Agreed. I'll add new column like a "command" to tell whether running CLUSTER or VACUUM. And how about this new view name? - pg_stat_progress_reorg Is it more general name than previous name if it covers both commands? I'll add this patch to CF2017-09. Any comments or suggestion are welcome. Nice to see that you are taking the time to implement patches for upstream, Yamada-san! Same here. :) I'd like to contribute creating feature that is for DBA and users. Progress monitoring feature is important from my DBA experiences. I'm happy if you lend your hand. Thanks, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor code improvement to postgresGetForeignPlan
Hi Fujita-san, > The patch looks good to me, so I'll mark this as Ready for Committer. Thanks for reviewing my patch. > (I'm not sure we should do the same thing to the function declaration in > other places such as fdwapi.h and the documentation for consistency, but if > so, I'd vote for leaving that for another patch.) +1 Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
Hi Sawada-san, Thanks for taking your time. I'll be more careful. Regards, Tatsuro Yamada On 2017/09/04 11:51, Masahiko Sawada wrote: On Mon, Sep 4, 2017 at 11:37 AM, Tatsuro Yamada wrote: Hi Sawada-san, Thomas, Thanks for sharing the reggression.diff. I realized Thomas's comment is right. Attached patch is fixed version. Could you try it? Yeah, in my environment the regression test passed. Thanks. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
Hi Sawada-san, Thomas, Thanks for sharing the reggression.diff. I realized Thomas's comment is right. Attached patch is fixed version. Could you try it? Regards, Tatsuro Yamada NTT Open Source Software Center On 2017/09/01 17:59, Masahiko Sawada wrote: On Fri, Sep 1, 2017 at 3:38 PM, Tatsuro Yamada wrote: Hi Thomas, Any comments or suggestion are welcome. Although this patch updates src/test/regress/expected/rules.out I think perhaps you included the wrong version? That regression test fails for me Thanks for the comment. I use the patch on 7b69b6ce and it's fine. Did you use "initdb" command after "make install"? The pg_stat_progress_cluster view is created in initdb, probably. I also got a regression test error (applied to abe85ef). Here is regression.diff file. *** /home/masahiko/source/postgresql/src/test/regress/expected/rules.out 2017-09-01 17:27:33.680055612 -0700 --- /home/masahiko/source/postgresql/src/test/regress/results/rules.out 2017-09-01 17:28:10.410055596 -0700 *** *** 1819,1824 --- 1819,1849 pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d; + pg_stat_progress_cluster| SELECT s.pid, + s.datid, + d.datname, + s.relid, + CASE s.param1 + WHEN 0 THEN 'initializing'::text + WHEN 1 THEN 'scanning heap'::text + WHEN 2 THEN 'sorting tuples'::text + WHEN 3 THEN 'writing new heap'::text + WHEN 4 THEN 'scan heap and write new heap'::text + WHEN 5 THEN 'swapping relation files'::text + WHEN 6 THEN 'rebuilding index'::text + WHEN 7 THEN 'performing final cleanup'::text + ELSE NULL::text + END AS phase, + CASE s.param2 + WHEN 0 THEN 'index scan'::text + WHEN 1 THEN 'seq scan'::text + ELSE NULL::text + END AS scan_method, + s.param3 AS scan_index_relid, + s.param4 AS heap_tuples_total, + s.param5 AS heap_tuples_scanned +FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10) + LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_progress_vacuum| SELECT s.pid, s.datid, d.datname, *** *** 1841,1871 s.param7 AS num_dead_tuples FROM (pg_stat_get_progress_info('VACUUM'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10) LEFT JOIN pg_database d ON ((s.datid = d.oid))); - pg_stat_progress_cluster| SELECT - s.pid, - s.datid, - d.datname, - s.relid, - CASE s.param1 - WHEN 0 THEN 'initializing'::text - WHEN 1 THEN 'scanning heap'::text - WHEN 2 THEN 'sorting tuples'::text - WHEN 3 THEN 'writing new heap'::text - WHEN 4 THEN 'scan heap and write new heap'::text - WHEN 5 THEN 'swapping relation files'::text - WHEN 6 THEN 'rebuilding index'::text - WHEN 7 THEN 'performing final cleanup'::text - ELSE NULL::text - END AS phase, - CASE S.param2 - WHEN 0 THEN 'index scan' - WHEN 1 THEN 'seq scan' - END AS scan_method, - s.param3 AS index_relid, - s.param4 AS heap_blks_total, - s.param5 AS heap_blks_scanned -FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5) - LEFT JOIN pg_database d ON ((s.datid = d.oid))); pg_stat_replication| SELECT s.pid, s.usesysid, u.rolname AS usename, --- 1866,1871 == Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 38bf636..35a5c63 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER, showing current progress. + See . + + + @@ -3233,9 +3241,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting
Re: [HACKERS] CLUSTER command progress monitor
Hi Thomas, Any comments or suggestion are welcome. Although this patch updates src/test/regress/expected/rules.out I think perhaps you included the wrong version? That regression test fails for me Thanks for the comment. I use the patch on 7b69b6ce and it's fine. Did you use "initdb" command after "make install"? The pg_stat_progress_cluster view is created in initdb, probably. Regards, Tatsuro Yamada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CLUSTER command progress monitor
Hi, Following is a proposal for reporting the progress of CLUSTER command: It seems that the following could be the phases of CLUSTER processing: 1. scanning heap 2. sort tuples 3. writing new heap 4. scan heap and write new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup These phases are based on Rahila's presentation at PGCon 2017 (https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf) and I added some phases on it. CLUSTER command may use Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Seq Scan 1. scanning heap 2. sort tuples 3. writing new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup * Index Scan 4. scan heap and write new heap 5. swapping relation files 6. rebuild index 7. performing final cleanup The view provides the information of CLUSTER command progress details as follows postgres=# \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column| Type | Collation | Nullable | Default -+-+---+--+- pid | integer | | | datid | oid | | | datname | name| | | relid | oid | | | phase | text| | | scan_method | text| | | scan_index_relid| bigint | | | heap_tuples_total | bigint | | | heap_tuples_scanned | bigint | | | Then I have questions. * Should we have separate views for them? Or should both be covered by the same view with some indication of which command (CLUSTER or VACUUM FULL) is actually running? I mean this progress monitor could be covering not only CLUSTER command but also VACUUM FULL command. * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not easy to get current blocks from Index Scan. Is it Ok? I'll add this patch to CF2017-09. Any comments or suggestion are welcome. Regards, Tatsuro Yamada NTT Open Source Software Center diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 5575c2c..18fe2c6 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER, showing current progress. + See . + + + @@ -3229,9 +3237,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. + This may be expanded in the future. @@ -3423,6 +3431,157 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering. + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + phase + text + + Current processing phase of cluster. See . + + + + scan_method + text + + Scan method of table: index scan/seq scan. + + + + scan_index_relid + bigint + + OID of the index. + + + + heap_tuples_total + bigint + + Total number of heap tuples in the table. This number is reported + as of the beginning of the scan; tuples added later will not be (and + need not be) visited by this CLUSTER. + + + + heap_tuples_scanned + bigint + + Number of heap tuples scanned. + This counter only advances when the phase is scanning heap, + writing new heap an
[HACKERS] Minor code improvement to postgresGetForeignPlan
Hi, The declaration of postgresGetForeignPlan uses baserel, but the actual definition uses foreignrel. It would be better to sync. Please find attached a patch. Tatsuro Yamada NTT Open Source Software Center diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 2851869..54edf4d 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -280,7 +280,7 @@ static void postgresGetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid); static ForeignScan *postgresGetForeignPlan(PlannerInfo *root, - RelOptInfo *baserel, + RelOptInfo *foreignrel, Oid foreigntableid, ForeignPath *best_path, List *tlist, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths
On 2016/11/04 0:29, Robert Haas wrote: Committed. Thanks! Tatsuro Yamada NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths
Hi Ashutosh, You are right. Every call except that one is using NIL, so better to fix it. The pattern was repeated in the recent aggregate pushdown support. Here's patch to fix create_foreignscan_path() call in add_foreign_grouping_paths() as well. Thanks for the review! Tatsuro Yamada NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor code improvement to postgresGetForeignJoinPaths
Hi, The last argument of create_foreignscan_path called by postgresGetForeignJoinPaths is set to NULL, but it would be suitable to set it to NIL because the argument type is List. Please find attached a patch. Tatsuro Yamada NTT Open Source Software Center diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index daf0438..a5de611 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -4331,7 +4331,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root, NIL, /* no pathkeys */ NULL, /* no required_outer */ epq_path, - NULL); /* no fdw_private */ + NIL); /* no fdw_private */ /* Add generated path into joinrel by add_path(). */ add_path(joinrel, (Path *) joinpath); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, I ran tpc-h's queries on this version and it's successful. The error is fixed. commit 705ad7f3b523acae0ddfdebd270b7048b2bb8029 Author: Tom Lane Date: Sun Jun 19 13:11:40 2016 -0400 Regards, Tatsuro Yamada NTT OSS Center On 2016/06/18 1:26, Robert Haas wrote: On Fri, Jun 17, 2016 at 9:29 AM, Robert Haas wrote: On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila wrote: Attached, please find updated patch based on above lines. Due to addition of projection path on top of partial paths, some small additional cost is added for parallel paths. In one of the tests in select_parallel.sql the plan was getting converted to serial plan from parallel plan due to this cost, so I have changed it to make it more restrictive. Before changing, I have debugged the test to confirm that it was due to this new projection path cost. I have added two new tests as well to cover the new code added by patch. I'm going to go work on this patch now. Here is a revised version, which I plan to commit in a few hours before the workday expires. I kept it basically as Amit had it, but I whacked the comments around some and, more substantively, avoided inserting and/or charging for a Result node when that's not necessary. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, I applied your patch and run tpc-h. Then I got new errors on Q4,12,17 and 19. ERROR: Aggref found in non-Agg plan node. See bellow, -- postgres=# \i queries/4.explain.sql ERROR: Aggref found in non-Agg plan node STATEMENT: explain analyze select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority LIMIT 1; ---------- Regards, Tatsuro Yamada NTT OSS Center On 2016/06/13 16:18, Amit Kapila wrote: On Mon, Jun 13, 2016 at 11:05 AM, David Rowley mailto:david.row...@2ndquadrant.com>> wrote: > > On 13 June 2016 at 15:39, Thomas Munro mailto:thomas.mu...@enterprisedb.com>> wrote: > > What is going on here? > > ... > > > > > postgres=# set max_parallel_workers_per_gather = 2; > > SET > > postgres=# explain select length(data) from logs group by length(data); > > ERROR: ORDER/GROUP BY expression not found in targetlist > > Seems like this was caused by 04ae11f62e643e07c411c4935ea6af46cb112aa9 > In create_grouping_paths(), we are building partial_grouping_path and same is used for gather path and other grouping paths (for partial paths). However, we don't use it for partial path list and sort path due to which path target for Sort path is different from what we have expected. Is there a problem in applying partial_grouping_path for partial pathlist? Attached patch just does that and I don't see error with patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
On 2016/06/13 15:52, Michael Paquier wrote: On Mon, Jun 13, 2016 at 2:42 PM, Tatsuro Yamada wrote: I got mistake to write an e-mail to -hackers on last week. :-< I should have written this. The bug has not fixed by Tom Lane's patch: commit aeb9ae6. Because I got the same error using tpc-h. This looks like a different regression.. I understand it now, thanks. :-) I checked the list, but the bug is not listed. https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items And the winner is: 04ae11f62e643e07c411c4935ea6af46cb112aa9 is the first bad commit commit 04ae11f62e643e07c411c4935ea6af46cb112aa9 Author: Robert Haas Date: Fri Jun 3 14:27:33 2016 -0400 I am adding that to the list of open items. Oh... I'll try to run tpc-h if I got a new patch which fixes the bug. :) Thanks, Tatsuro Yamada NTT OSS Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, Subject: Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist Date: Thu, 09 Jun 2016 12:08:01 +0900 Right, I saw that thread which involved the same error message: https://www.postgresql.org/message-id/flat/20160526021235.w4nq7k3gnheg7vit%40alap3.anarazel.de#20160526021235.w4nq7k3gnheg7...@alap3.anarazel.de ... but that seems to be a different problem than the one you and I have seen, it involved repeated references to columns in the tlist. It was fixed with this commit: commit aeb9ae6457865c8949641d71a9523374d843a418 Author: Tom Lane Date: Thu May 26 14:52:24 2016 -0400 Disable physical tlist if any Var would need multiple sortgroupref labels. I use this version:f721e94 to run tpc-h on last week. This patch is commited at Jun 8. If it fixed, I didn't get the error. >PG96beta1 > commit: f721e94b5f360391fc3ffe183bf697a0441e9184 - commit f721e94b5f360391fc3ffe183bf697a0441e9184 Author: Robert Haas Date: Wed Jun 8 08:37:06 2016 -0400 Fix typo. Amit Langote - I got mistake to write an e-mail to -hackers on last week. :-< I should have written this. The bug has not fixed by Tom Lane's patch: commit aeb9ae6. Because I got the same error using tpc-h. Today, I try it again by changing max_parallel_workers_per_gather parameter. The result of Q1 is bellow. Is this bug in the Open items on wiki? I don't see it on the Open Issues list. I checked the list, but the bug is not listed. https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items Regards, Tatsuro Yamada NTT OSS Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, I tried to run tpc-h queries, but some queries failed by the error on last week. >Subject: Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist >Date: Thu, 09 Jun 2016 12:08:01 +0900 Today, I try it again by changing max_parallel_workers_per_gather parameter. The result of Q1 is bellow. Is this bug in the Open items on wiki? - postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# \i queries/1.explain.sql QUERY PLAN - Limit (cost=43474.03..43474.03 rows=1 width=236) (actual time=1039.583..1039.583 rows=1 loops=1) -> Sort (cost=43474.03..43474.04 rows=6 width=236) (actual time=1039.583..1039.583 rows=1 loops=1) Sort Key: l_returnflag, l_linestatus Sort Method: top-N heapsort Memory: 25kB -> HashAggregate (cost=43473.83..43474.00 rows=6 width=236) (actual time=1039.529..1039.534 rows=4 loops=1) Group Key: l_returnflag, l_linestatus -> Seq Scan on lineitem (cost=0.00..19668.15 rows=595142 width=25) (actual time=0.048..125.332 rows=595224 loops=1) Filter: (l_shipdate <= '1998-09-22 00:00:00'::timestamp without time zone) Rows Removed by Filter: 5348 Planning time: 0.180 ms Execution time: 1039.758 ms (11 rows) postgres=# set max_parallel_workers_per_gather = default; SET postgres=# \i queries/1.explain.sql ERROR: ORDER/GROUP BY expression not found in targetlist --------- Regards, Tatsuro Yamada NTT OSS Center On 2016/06/13 12:39, Thomas Munro wrote: Hi, What is going on here? postgres=# create table logs as select generate_series(1, 100)::text as data; SELECT 100 postgres=# insert into logs select * from logs; INSERT 0 100 postgres=# insert into logs select * from logs; INSERT 0 200 postgres=# insert into logs select * from logs; INSERT 0 400 postgres=# insert into logs select * from logs; INSERT 0 800 postgres=# insert into logs select * from logs; INSERT 0 1600 postgres=# analyze logs; ANALYZE postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# explain select length(data) from logs group by length(data); ┌┐ │ QUERY PLAN │ ├┤ │ Group (cost=5843157.07..6005642.13 rows=993989 width=4) │ │ Group Key: (length(data))│ │ -> Sort (cost=5843157.07..5923157.11 rows=3218 width=4)│ │ Sort Key: (length(data)) │ │ -> Seq Scan on logs (cost=0.00..541593.22 rows=3218 width=4) │ └┘ (5 rows) postgres=# set max_parallel_workers_per_gather = 2; SET postgres=# explain select length(data) from logs group by length(data); ERROR: ORDER/GROUP BY expression not found in targetlist -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist
Hi, I got same error by TPC-H: Q1,4,8,12 and 17. I've attached results of the queries. TPC-H (thanks to Tomas Vondra) https://github.com/tvondra/pg_tpch Datasize Scale Factor: 1 PG96beta1 commit: f721e94b5f360391fc3ffe183bf697a0441e9184 Regards, Tatsuro Yamada NTT OSS Center On 2016/05/27 2:22, Tom Lane wrote: > Andres Freund writes: >> trying to reproduce a performance problem I just found: > >> =# CREATE TABLE twocol(col01 int, col02 int); >> =# SELECT DISTINCT col01, col02, col01 FROM twocol ; >> ERROR: XX000: ORDER/GROUP BY expression not found in targetlist >> LOCATION: get_sortgroupref_tle, tlist.c:341 > >> which appears to be a 9.6 regression, presumable fallout from the path >> restructuring. > > Huh. The problem is that createplan.c is trying to apply the > physical-tlist optimization to the seqscan underneath the aggregate > node. That means that the output from the seqscan is just > "col01, col02", which means that col01 can only be decorated with > a single ressortgroupref ... but there are two ressortgrouprefs > for it as far as the groupClause is concerned. Only one gets applied > to the seqscan's tlist, and then later we fail because we don't find > the other one there. Conclusions: > > * we need to back off the physical-tlist optimization in this case > > * the code that transfers sortgroupref labels onto a tlist probably > ought to notice and complain if it's asked to put inconsistent labels > onto the same column. > > I'm a little surprised that it's not discarding the third grouping > item as redundant ... but that's probably not something to mess with > right now. Prior versions don't appear to do that either. > > regards, tom lane > > commit f721e94b5f360391fc3ffe183bf697a0441e9184 [postgres@bluey queries]$ psql psql (9.6beta1) Type "help" for help. postgres=# \i 1.explain.sql ERROR: ORDER/GROUP BY expression not found in targetlist STATEMENT: explain analyze select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '70' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus LIMIT 1; postgres=# \i 4.explain.sql ERROR: ORDER/GROUP BY expression not found in targetlist STATEMENT: explain analyze select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority LIMIT 1; postgres=# \i 8.explain.sql ERROR: ORDER/GROUP BY expression not found in targetlist STATEMENT: explain analyze select o_year, sum(case when nation = 'EGYPT' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_or
[HACKERS] Comment typo in port/atomics/generic.h
Hi, I found a typo in generic.h The attached patch fixes it: s/tomic/atomic/g Best regards, Tatsuro Yamada *** a/src/include/port/atomics/generic.h --- b/src/include/port/atomics/generic.h *** *** 1,7 /*- * * generic.h ! * Implement higher level operations based on some lower level tomic * operations. * * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group --- 1,7 /*- * * generic.h ! * Implement higher level operations based on some lower level atomic * operations. * * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Comment typo in port/atomics/arch-x86.h
Hi, I found a typo in arch-x86.h. The attached patch fixes a typo: s/468/486/g Best regards, Tatsuro Yamada *** a/src/include/port/atomics/arch-x86.h --- b/src/include/port/atomics/arch-x86.h *** *** 67,73 typedef struct pg_atomic_uint32 /* * It's too complicated to write inline asm for 64bit types on 32bit and the ! * 468 can't do it anyway. */ #ifdef __x86_64__ #define PG_HAVE_ATOMIC_U64_SUPPORT --- 67,73 /* * It's too complicated to write inline asm for 64bit types on 32bit and the ! * 486 can't do it anyway. */ #ifdef __x86_64__ #define PG_HAVE_ATOMIC_U64_SUPPORT -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A Typo in regress/sql/privileges.sql
Hi, This is my first post to -hackers. I found typos in privileges.sql and privileges.out Please find attached a patch. Best regards, Tatsuro Yamada *** a/src/test/regress/expected/privileges.out --- b/src/test/regress/expected/privileges.out *** *** 390,396 INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- f ERROR: permission denied for relation atest5 INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT) ERROR: permission denied for relation atest5 ! -- Check that the the columns in the inference require select privileges -- Error. No privs on four INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10; ERROR: permission denied for relation atest5 --- 390,396 ERROR: permission denied for relation atest5 INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT) ERROR: permission denied for relation atest5 ! -- Check that the columns in the inference require select privileges -- Error. No privs on four INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10; ERROR: permission denied for relation atest5 *** a/src/test/regress/sql/privileges.sql --- b/src/test/regress/sql/privileges.sql *** *** 259,265 INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLU INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLUDED.three; INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- fails (due to UPDATE) INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT) ! -- Check that the the columns in the inference require select privileges -- Error. No privs on four INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10; --- 259,265 INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = EXCLUDED.three; INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set one = 8; -- fails (due to UPDATE) INSERT INTO atest5(three) VALUES (4) ON CONFLICT (two) DO UPDATE set three = 10; -- fails (due to INSERT) ! -- Check that the columns in the inference require select privileges -- Error. No privs on four INSERT INTO atest5(three) VALUES (4) ON CONFLICT (four) DO UPDATE set three = 10; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers