Re: Improvements and additions to COPY progress reporting

2021-03-15 Thread Michael Paquier
On Mon, Mar 15, 2021 at 12:43:40PM +0100, Matthias van de Meent wrote:
> Hmm, does CFBot not run checkout on windows with crlf line endings? I
> had expected it to do as such.

This is environment-sensitive, so I am not surprised that Appveyor
changes the way newlines are handled there.  I could see the
difference by running the tests manually on Windows command prompt for
example.

> That seems great, thanks for picking this up.

Okay.  Applied, then.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-03-15 Thread Matthias van de Meent
On Mon, 15 Mar 2021 at 05:53, Michael Paquier  wrote:
>
> On Wed, Mar 10, 2021 at 09:35:10AM +0100, Matthias van de Meent wrote:
> > There are examples in which pg_stat_progress_* -views report
> > inaccurate data. I think it is fairly reasonable to at least validate
> > some part of the progress reporting, as it is one of the few methods
> > for administrators to look at the state of currently running
> > administrative tasks, and as such, this user-visible api should be
> > validated.
>
> Looking closer at 0002, the size numbers are actually incorrect on
> Windows for the second query.  The CRLFs at the end of each line of
> emp.data add three bytes to the report of COPY FROM, so this finishes
> with 82 bytes for bytes_total and bytes_processed instead of 79.

Hmm, does CFBot not run checkout on windows with crlf line endings? I
had expected it to do as such.

> Let's make this useful but simpler here, so I propose to check that
> the counters are higher than zero instead of an exact number.  Let's
> also add the relation name relid::regclass while on it.

+1, I hadn't thought of casting relid to its regclass to get a stable
identifier.

> The tests introduced are rather limited, but you are right that
> something is better than nothing here, and I have slightly updated
> what the tests sent previously as per the attached.  What do you
> think?

That seems great, thanks for picking this up.


With regards,

Matthias van de Meent




Re: Improvements and additions to COPY progress reporting

2021-03-14 Thread Michael Paquier
On Wed, Mar 10, 2021 at 09:35:10AM +0100, Matthias van de Meent wrote:
> There are examples in which pg_stat_progress_* -views report
> inaccurate data. I think it is fairly reasonable to at least validate
> some part of the progress reporting, as it is one of the few methods
> for administrators to look at the state of currently running
> administrative tasks, and as such, this user-visible api should be
> validated.

Looking closer at 0002, the size numbers are actually incorrect on
Windows for the second query.  The CRLFs at the end of each line of
emp.data add three bytes to the report of COPY FROM, so this finishes
with 82 bytes for bytes_total and bytes_processed instead of 79.
Let's make this useful but simpler here, so I propose to check that
the counters are higher than zero instead of an exact number.  Let's
also add the relation name relid::regclass while on it.

The tests introduced are rather limited, but you are right that
something is better than nothing here, and I have slightly updated
what the tests sent previously as per the attached.  What do you
think?
--
Michael
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..9e4766898d 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -201,3 +201,65 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+--
+-- Progress reporting for COPY
+--
+create table tab_progress_reporting (
+	name text,
+	age int4,
+	location point,
+	salary int4,
+	manager name
+);
+
+-- Add a trigger to catch and print the contents of the catalog view
+-- pg_stat_progress_copy during data insertion.  This allows to test
+-- the validation of some progress reports for COPY FROM where the trigger
+-- would fire.
+create function notice_after_tab_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- The fields ignored here are the ones that may not remain
+	-- consistent across multiple runs.  The sizes reported may differ
+	-- across platforms, so just check if these are strictly positive.
+	with progress_data as (
+	  select
+	 relid::regclass::text as relname,
+	 command,
+	 type,
+	 bytes_processed > 0 as has_bytes_processed,
+	 bytes_total > 0 as has_bytes_total,
+	 tuples_processed,
+	 tuples_excluded
+	from pg_stat_progress_copy
+	where pid = pg_backend_pid())
+	select into report (to_jsonb(r)) as value
+		from progress_data r;
+
+	raise info 'progress: %', report.value::text;
+	return new;
+end;
+$$ language plpgsql;
+
+create trigger check_after_tab_progress_reporting
+	after insert on tab_progress_reporting
+	for each statement
+	execute function notice_after_tab_progress_reporting();
+
+-- Generate COPY FROM report with PIPE.
+copy tab_progress_reporting from stdin;
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+\.
+
+-- Generate COPY FROM report with FILE, with some excluded tuples.
+truncate tab_progress_reporting;
+copy tab_progress_reporting from '@abs_srcdir@/data/emp.data'
+	where (salary < 2000);
+
+drop trigger check_after_tab_progress_reporting on tab_progress_reporting;
+drop function notice_after_tab_progress_reporting();
+drop table tab_progress_reporting;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..8cf37dc257 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -165,3 +165,57 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+--
+-- Progress reporting for COPY
+--
+create table tab_progress_reporting (
+	name text,
+	age int4,
+	location point,
+	salary int4,
+	manager name
+);
+-- Add a trigger to catch and print the contents of the catalog view
+-- pg_stat_progress_copy during data insertion.  This allows to test
+-- the validation of some progress reports for COPY FROM where the trigger
+-- would fire.
+create function notice_after_tab_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- The fields ignored here are the ones that may not remain
+	-- consistent across multiple runs.  The sizes reported may differ
+	-- across platforms, so just check if these are strictly positive.
+	with progress_data as (
+	  select
+	 relid::regclass::text as relname,
+	 command,
+	 type,
+	 bytes_processed > 0 as has_bytes_processed,
+	 bytes_total > 0 as has_bytes_total,
+	 tuples_processed,
+	 tuples_excluded
+	from pg_stat_progress_copy
+	where pid = pg_backend_pid())
+	select into report (to_jsonb(r)) as value
+		from progress_data r;
+
+	raise info 'progress: %', report.value::text;
+	return new;
+end;
+$$ language plpgsql;
+create trigger check_after_tab_progress_reporting
+	after insert on tab_progress_reporting
+	for each statement
+	execute function notice_after_tab_progress_reporting();
+-- Generate 

Re: Improvements and additions to COPY progress reporting

2021-03-10 Thread Matthias van de Meent
On Tue, 9 Mar 2021 at 06:34, Michael Paquier  wrote:
>
> On Mon, Mar 08, 2021 at 05:33:40PM +0100, Matthias van de Meent wrote:
> > Seems reasonable. PFA updated patches. I've renamed the previous 0003
> > to 0002 to keep git-format-patch easy.
>
> Thanks for updating the patch.  0001 has been applied, after tweaking
> a bit comments, indentation and the docs.

Thanks!

> > This is keeping current behaviour of the implementation as committed
> > with 8a4f618e, with the rationale of that patch being that this number
> > should mirror the number returned by the copy command.
> >
> > I am not opposed to adding another column for `tuples_inserted` and
> > changing the logic accordingly (see prototype 0003), but that was not
> > in the intended scope of this patchset. Unless you think that this
> > should be included in this current patchset, I'll spin that patch out
> > into a different thread, but I'm not sure that would make it into
> > pg14.
>
> Okay, point taken.  If there is demand for it in the future, we could
> extend the existing set of columns.  After thinking more about it the
> usecase if not completely clear to me from a monitoring point of
> view.
>
> I have not looked at 0002 in details yet, but I am wondering first if
> the size estimations in the expected output are actually portable.
> Second, I doubt a bit that the extra cycles spent on that are actually
> worth the coverage, even if the trick with an AFTER INSERT trigger is
> interesting.

There are examples in which pg_stat_progress_* -views report
inaccurate data. I think it is fairly reasonable to at least validate
some part of the progress reporting, as it is one of the few methods
for administrators to look at the state of currently running
administrative tasks, and as such, this user-visible api should be
validated.


With regards,

Matthias van de Meent




Re: Improvements and additions to COPY progress reporting

2021-03-09 Thread Josef Šimánek
út 9. 3. 2021 v 6:34 odesílatel Michael Paquier  napsal:
>
> On Mon, Mar 08, 2021 at 05:33:40PM +0100, Matthias van de Meent wrote:
> > Seems reasonable. PFA updated patches. I've renamed the previous 0003
> > to 0002 to keep git-format-patch easy.
>
> Thanks for updating the patch.  0001 has been applied, after tweaking
> a bit comments, indentation and the docs.
>
> > This is keeping current behaviour of the implementation as committed
> > with 8a4f618e, with the rationale of that patch being that this number
> > should mirror the number returned by the copy command.
> >
> > I am not opposed to adding another column for `tuples_inserted` and
> > changing the logic accordingly (see prototype 0003), but that was not
> > in the intended scope of this patchset. Unless you think that this
> > should be included in this current patchset, I'll spin that patch out
> > into a different thread, but I'm not sure that would make it into
> > pg14.
>
> Okay, point taken.  If there is demand for it in the future, we could
> extend the existing set of columns.  After thinking more about it the
> usecase if not completely clear to me from a monitoring point of
> view.
>
> I have not looked at 0002 in details yet, but I am wondering first if
> the size estimations in the expected output are actually portable.
> Second, I doubt a bit that the extra cycles spent on that are actually
> worth the coverage, even if the trick with an AFTER INSERT trigger is
> interesting.

Those extra cycles are in there to cover at least parts of the COPY
progress from being accidentally broken. I have seen various patches
modifying COPY command being currently in progress. It would be nice
to ensure at least basic functionality works well in an automated way.
On my machine there is no huge overhead added by adding those tests
(they finish almost instantly).

> --
> Michael




Re: Improvements and additions to COPY progress reporting

2021-03-09 Thread Bharath Rupireddy
On Tue, Mar 9, 2021 at 11:04 AM Michael Paquier  wrote:
> > This is keeping current behaviour of the implementation as committed
> > with 8a4f618e, with the rationale of that patch being that this number
> > should mirror the number returned by the copy command.
> >
> > I am not opposed to adding another column for `tuples_inserted` and
> > changing the logic accordingly (see prototype 0003), but that was not
> > in the intended scope of this patchset. Unless you think that this
> > should be included in this current patchset, I'll spin that patch out
> > into a different thread, but I'm not sure that would make it into
> > pg14.
>
> Okay, point taken.  If there is demand for it in the future, we could
> extend the existing set of columns.  After thinking more about it the
> usecase if not completely clear to me from a monitoring point of
> view.

I think, for now, having better comments on what's included and what's
not in the existing tuples_processed column would help.

> I have not looked at 0002 in details yet, but I am wondering first if
> the size estimations in the expected output are actually portable.
> Second, I doubt a bit that the extra cycles spent on that are actually
> worth the coverage, even if the trick with an AFTER INSERT trigger is
> interesting.

I was having the same doubt [1], please have a look at it and the few
threads that follow. IMO, we can have the tests without the
"bytes_total" column if we think that it's not stable across all OS
platforms. I think those tests don't take much time to run and AFAIK
it's the first progress report command to have tests because others
such as analyse, vacuum, cluster, base backup don't have.

[1] - 
https://www.postgresql.org/message-id/flat/CALj2ACUyNREth8f3M7wXrHVeycfnqBn5pVygYOoBVs%3Difo8V4A%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-03-08 Thread Michael Paquier
On Mon, Mar 08, 2021 at 05:33:40PM +0100, Matthias van de Meent wrote:
> Seems reasonable. PFA updated patches. I've renamed the previous 0003
> to 0002 to keep git-format-patch easy.

Thanks for updating the patch.  0001 has been applied, after tweaking
a bit comments, indentation and the docs.

> This is keeping current behaviour of the implementation as committed
> with 8a4f618e, with the rationale of that patch being that this number
> should mirror the number returned by the copy command.
> 
> I am not opposed to adding another column for `tuples_inserted` and
> changing the logic accordingly (see prototype 0003), but that was not
> in the intended scope of this patchset. Unless you think that this
> should be included in this current patchset, I'll spin that patch out
> into a different thread, but I'm not sure that would make it into
> pg14.

Okay, point taken.  If there is demand for it in the future, we could
extend the existing set of columns.  After thinking more about it the
usecase if not completely clear to me from a monitoring point of
view.

I have not looked at 0002 in details yet, but I am wondering first if
the size estimations in the expected output are actually portable.
Second, I doubt a bit that the extra cycles spent on that are actually
worth the coverage, even if the trick with an AFTER INSERT trigger is
interesting.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-03-08 Thread Matthias van de Meent
On Mon, 8 Mar 2021 at 09:24, Michael Paquier  wrote:
>
> On Sun, Mar 07, 2021 at 04:50:31PM +0530, Bharath Rupireddy wrote:
> > Attaching remaining patches 0001 and 0003 from the v11 patch
> > set(posted upthread) here to make cfbot happier.
>
> Looking at patch 0002, the location of each progress report looks good
> to me.  I have some issues with some of the names chosen though, so I
> would like to suggest a few changes to simplify things:
> - PROGRESS_COPY_IO_TYPE_* => PROGRESS_COPY_TYPE_*
> - PROGRESS_COPY_IO_TYPE => PROGRESS_COPY_TYPE
> - PROGRESS_COPY_TYPE_STDIO => PROGRESS_COPY_TYPE_PIPE
> - In pg_stat_progress_copy, io_type => type

Seems reasonable. PFA updated patches. I've renamed the previous 0003
to 0002 to keep git-format-patch easy.

> It seems a bit confusing to not count insertions on foreign tables
> where nothing happened.  I am fine to live with that, but can I ask if
> this has been thought about?

This is keeping current behaviour of the implementation as committed
with 8a4f618e, with the rationale of that patch being that this number
should mirror the number returned by the copy command.

I am not opposed to adding another column for `tuples_inserted` and
changing the logic accordingly (see prototype 0003), but that was not
in the intended scope of this patchset. Unless you think that this
should be included in this current patchset, I'll spin that patch out
into a different thread, but I'm not sure that would make it into
pg14.

With regards,

Matthias van de Meent.
From 94876abe0ab9c28a6f4b0ac006f356251ca4746c Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 8 Mar 2021 16:08:24 +0100
Subject: [PATCH v12 3/3] Adapt COPY progress reporting to report processed and
 inserted tuples

Previously, tuples_processed was implied to be the amount of tuples inserted
in COPY ... FROM. This code is now changed to make tuples_processed count
all tuples that were identified in the COPY ... FROM command, and make
tuples_inserted count all tuples that were inserted into the table (that is,
it excludes the tuples excluded using the WHERE clause, and those that were
not inserted due to triggers, or failure to insert into foreign tables).

This also renumbers the columns to be back in-order, before the stamping of
pg14 makes those numbers effectively immutable
---
 doc/src/sgml/monitoring.sgml | 30 
 src/backend/catalog/system_views.sql | 13 ++-
 src/backend/commands/copyfrom.c  | 34 
 src/include/commands/progress.h  | 13 ++-
 src/test/regress/expected/rules.out  | 13 ++-
 src/test/regress/output/copy.source  |  4 ++--
 6 files changed, 83 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 32bebc70db..aa2e15a748 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6595,6 +6595,24 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
   
Number of tuples already processed by COPY command.
+   For COPY FROM this includes all tuples from the
+   source material (including tuples excluded by the
+   WHERE clause of the COPY
+   command, and tuples that were not inserted as a result of trigger
+   behaviour). For COPY TO this includes all tuples
+   that exist in the table, or those that are returned by the
+   SELECT.
+  
+ 
+
+ 
+  
+   tuples_inserted bigint
+  
+  
+   Number of tuples inserted into the table with the
+   COPY FROM command. Is 0 for
+   COPY TO.
   
  
 
@@ -6610,6 +6628,18 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 

   
+  
+   
+The tuples_excluded column does not count the tuples
+that failed to insert, but only those tuples that were available in the
+source material and not eligible for insertion through exclusion by the
+WHERE clause. You can calculate the number of tuples
+that failed to insert into the table due to triggers (or that otherwise
+silently fail to insert) by subtracting both
+tuples_excluded and tuples_inserted
+from tuples_processed instead.
+   
+  
  
 
  
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 8f9987f311..f59de36742 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1130,18 +1130,19 @@ CREATE VIEW pg_stat_progress_copy AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
 S.relid AS relid,
-CASE S.param5 WHEN 1 THEN 'COPY FROM'
+CASE S.param1 WHEN 1 THEN 'COPY FROM'
   WHEN 2 THEN 'COPY TO'
   END AS command,
-CASE S.param6 WHEN 1 THEN 'FILE'
+CASE S.param2 WHEN 1 THEN 'FILE'
   WHEN 2 THEN 'PROGRAM'
   WHEN 3 THEN 'PIPE'
   WHEN 4 THEN 'CALLBACK'
 

Re: Improvements and additions to COPY progress reporting

2021-03-08 Thread Michael Paquier
On Sun, Mar 07, 2021 at 04:50:31PM +0530, Bharath Rupireddy wrote:
> Attaching remaining patches 0001 and 0003 from the v11 patch
> set(posted upthread) here to make cfbot happier.

Looking at patch 0002, the location of each progress report looks good
to me.  I have some issues with some of the names chosen though, so I
would like to suggest a few changes to simplify things:
- PROGRESS_COPY_IO_TYPE_* => PROGRESS_COPY_TYPE_*
- PROGRESS_COPY_IO_TYPE => PROGRESS_COPY_TYPE
- PROGRESS_COPY_TYPE_STDIO => PROGRESS_COPY_TYPE_PIPE
- In pg_stat_progress_copy, io_type => type

It seems a bit confusing to not count insertions on foreign tables
where nothing happened.  I am fine to live with that, but can I ask if
this has been thought about?
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-03-07 Thread Bharath Rupireddy
On Fri, Mar 5, 2021 at 11:30 AM Michael Paquier  wrote:
>
> On Thu, Mar 04, 2021 at 05:45:50PM +0100, Matthias van de Meent wrote:
> > Sure, I'm convinced. PFA the patchset with this change applied.
>
> 0002 looks fine to me, and in line with the discussion, so applied.

Thanks.

Attaching remaining patches 0001 and 0003 from the v11 patch
set(posted upthread) here to make cfbot happier.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v11-0001-Add-progress-reported-components-for-COPY-progre.patch
Description: Binary data


v11-0003-Add-copy-progress-reporting-regression-tests.patch
Description: Binary data


Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Michael Paquier
On Thu, Mar 04, 2021 at 05:45:50PM +0100, Matthias van de Meent wrote:
> Sure, I'm convinced. PFA the patchset with this change applied.

0002 looks fine to me, and in line with the discussion, so applied.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Matthias van de Meent
On Thu, 4 Mar 2021 at 17:29, Justin Pryzby  wrote:
>
> On Thu, Mar 04, 2021 at 05:19:18PM +0100, Matthias van de Meent wrote:
> >
> > "Backends running [...] report progress to [...] instead" is,
> > a.f.a.i.k., correct English. Adding 'will' would indeed still be
> > correct, but it would in my opinion also be decremental to the
> > readability of the section due to the repeated use of the same
> > template sentence structure. I think that keeping it as-is is just
> > fine.
>
> I'd prefer to see the same thing repeated, since then it's easy to compare, 
> for
> readers, and also future doc authors.  That's normal in technical 
> documentation
> to have redundancy.  It's easy to read.
>
> I'd suggest to move "instead" into the middle of the sentence,
> and combine VACUUM+FULL, and add "their":
>
> > > +... Backends running VACUUM FULL will instead 
> > > report
> > > +their progress in the 
> > > pg_stat_progress_cluster view.

Sure, I'm convinced. PFA the patchset with this change applied.

With regards,

Matthias van de Meent
From af85e442b5f61e145e6214d1fbfc4269af3bc9f1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 13:06:45 +0100
Subject: [PATCH v12 3/3] Add copy progress reporting regression tests.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This tests some basic features of copy progress reporting.

Sadly, we can only request one snapshot of progress_reporting each
transaction / statement, so we can't (easily) get intermediate results without
each result requiring a seperate statement being run.

Author: Josef Šimánek, Matthias van de Meent
---
 src/test/regress/input/copy.source  | 60 +
 src/test/regress/output/copy.source | 47 ++
 2 files changed, 107 insertions(+)

diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..4f1cbc73d2 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -201,3 +201,63 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+--
+-- progress reporting
+--
+
+-- setup
+-- reuse employer datatype, that has a small sized data set
+
+create table progress_reporting (
+	name 		text,
+	age			int4,
+	location 	point,
+	salary 		int4,
+	manager 	name
+);
+
+-- Add a trigger to 'raise info' the contents of pg_stat_progress_copy. This
+-- allows us to test and verify the contents of this view, which would
+-- otherwise require a concurrent session checking that table.
+create function notice_after_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- We cannot expect 'pid' nor 'relid' to be consistent over runs due to
+	-- variance in system process ids, and concurrency in runs of tests.
+	-- Additionally, due to the usage of this test in pg_regress, the 'datid'
+	-- also is not consistent between runs.
+	select into report (to_jsonb(r) - '{pid,relid,datid}'::text[]) as value
+		from pg_stat_progress_copy r
+		where pid = pg_backend_pid();
+
+	raise info 'progress: %', report.value::text;
+
+	RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+create trigger check_after_progress_reporting
+	after insert on progress_reporting
+	for each statement
+	execute function notice_after_progress_reporting();
+
+-- reporting of STDIO imports, and correct bytes-processed/tuples-processed reporting
+
+copy progress_reporting from stdin;
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+\.
+
+-- reporting of FILE imports, and correct reporting of tuples-excluded
+
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
+	where (salary < 2000);
+
+-- cleanup progress_reporting
+
+drop trigger check_after_progress_reporting on progress_reporting;
+drop function notice_after_progress_reporting();
+drop table progress_reporting;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..f3596bdd15 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -165,3 +165,50 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+--
+-- progress reporting
+--
+-- setup
+-- reuse employer datatype, that has a small sized data set
+create table progress_reporting (
+	name 		text,
+	age			int4,
+	location 	point,
+	salary 		int4,
+	manager 	name
+);
+-- Add a trigger to 'raise info' the contents of pg_stat_progress_copy. This
+-- allows us to test and verify the contents of this view, which would
+-- otherwise require a concurrent session checking that table.
+create function notice_after_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- We cannot expect 'pid' nor 'relid' to be consistent over runs due to
+	-- variance in system process ids, and concurrency in runs of tests.
+	-- Additionally, due to the usage of this 

Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Justin Pryzby
On Thu, Mar 04, 2021 at 05:19:18PM +0100, Matthias van de Meent wrote:
> On Thu, 4 Mar 2021 at 13:36, Bharath Rupireddy 
>  wrote:
> >
> > +   
> > +Each backend running VACUUM without the
> > +FULL option will report its progress in the
> > +pg_stat_progress_vacuum view. Backends running
> > +VACUUM with the FULL option 
> > report
> > +progress in the pg_stat_progress_cluster view
> > +instead. See  and
> > + for details.
> > +   
> >
> > I think a typo, missing "will" between option and report - it's "with
> > the FULL option will report"
> 
> "Backends running [...] report progress to [...] instead" is,
> a.f.a.i.k., correct English. Adding 'will' would indeed still be
> correct, but it would in my opinion also be decremental to the
> readability of the section due to the repeated use of the same
> template sentence structure. I think that keeping it as-is is just
> fine.

I'd prefer to see the same thing repeated, since then it's easy to compare, for
readers, and also future doc authors.  That's normal in technical documentation
to have redundancy.  It's easy to read.

I'd suggest to move "instead" into the middle of the sentence,
and combine VACUUM+FULL, and add "their":

> > +... Backends running VACUUM FULL will instead report
> > +their progress in the 
> > pg_stat_progress_cluster view.

-- 
Justin




Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Matthias van de Meent
On Thu, 4 Mar 2021 at 13:36, Bharath Rupireddy
 wrote:
>
> +   
> +Each backend running VACUUM without the
> +FULL option will report its progress in the
> +pg_stat_progress_vacuum view. Backends running
> +VACUUM with the FULL option report
> +progress in the pg_stat_progress_cluster view
> +instead. See  and
> + for details.
> +   
>
> I think a typo, missing "will" between option and report - it's "with
> the FULL option will report"

"Backends running [...] report progress to [...] instead" is,
a.f.a.i.k., correct English. Adding 'will' would indeed still be
correct, but it would in my opinion also be decremental to the
readability of the section due to the repeated use of the same
template sentence structure. I think that keeping it as-is is just
fine.



With regards,

Matthias van de Meent.




Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Josef Šimánek
čt 4. 3. 2021 v 12:32 odesílatel Matthias van de Meent
 napsal:
>
> On Thu, 4 Mar 2021 at 11:38, Michael Paquier  wrote:
> >
> > On Thu, Mar 04, 2021 at 12:16:17PM +0530, Bharath Rupireddy wrote:
> > > IMO, the phrasing proposed by Justin upthread looks good. It's like this:
> > >
> > > > +Each backend running ANALYZE will report its 
> > > > progress in
> > > > +the pg_stat_progress_analyze view. See
> >
> > No objections to just go with that.  As a new patch set is needed, I
> > am switching the CF entry to "Waiting on Author".
>
> Thanks for all your comments, and sorry for the delayed response.
> Please find attached a new version of the patch set, that is rebased
> and contains the requested changes:
>
> 1/3:
> Docs:
> - on which the COPY command is executed
> + on which the COPY command is being executed
> Reworded existing commment:
> - /* Increment amount of processed tuples and update the progress */
> + /* Increment the number of processed tuples, and report the progress */
>
> 2/3:
> Docs:
> - ... report its progress to ...
> + ... report its progress in ...
> - report its progress to the >pg_stat_progress_cluster< ...
> + report its progress in the >pg_stat_progress_cluster< view ...
>
> 3/3:
> No changes
>
> I believe that that was the extent of the not-yet-resolved comments
> and suggestions.

LGTM, special thanks for taking over the work on initial COPY progress
regression tests.

>
> With regards,
>
> Matthias van de Meent.




Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Bharath Rupireddy
On Thu, Mar 4, 2021 at 5:02 PM Matthias van de Meent
 wrote:
>
> On Thu, 4 Mar 2021 at 11:38, Michael Paquier  wrote:
> >
> > On Thu, Mar 04, 2021 at 12:16:17PM +0530, Bharath Rupireddy wrote:
> > > IMO, the phrasing proposed by Justin upthread looks good. It's like this:
> > >
> > > > +Each backend running ANALYZE will report its 
> > > > progress in
> > > > +the pg_stat_progress_analyze view. See
> >
> > No objections to just go with that.  As a new patch set is needed, I
> > am switching the CF entry to "Waiting on Author".
>
> Thanks for all your comments, and sorry for the delayed response.
> Please find attached a new version of the patch set, that is rebased
> and contains the requested changes:
>
> 1/3:
> Docs:
> - on which the COPY command is executed
> + on which the COPY command is being executed
> Reworded existing commment:
> - /* Increment amount of processed tuples and update the progress */
> + /* Increment the number of processed tuples, and report the progress */

LGTM.

> 2/3:
> Docs:
> - ... report its progress to ...
> + ... report its progress in ...
> - report its progress to the >pg_stat_progress_cluster< ...
> + report its progress in the >pg_stat_progress_cluster< view ...

+   
+Each backend running VACUUM without the
+FULL option will report its progress in the
+pg_stat_progress_vacuum view. Backends running
+VACUUM with the FULL option report
+progress in the pg_stat_progress_cluster view
+instead. See  and
+ for details.
+   

I think a typo, missing "will" between option and report - it's "with
the FULL option will report"

Except the above typo, 0002 LGTM.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Matthias van de Meent
On Thu, 4 Mar 2021 at 11:38, Michael Paquier  wrote:
>
> On Thu, Mar 04, 2021 at 12:16:17PM +0530, Bharath Rupireddy wrote:
> > IMO, the phrasing proposed by Justin upthread looks good. It's like this:
> >
> > > +Each backend running ANALYZE will report its 
> > > progress in
> > > +the pg_stat_progress_analyze view. See
>
> No objections to just go with that.  As a new patch set is needed, I
> am switching the CF entry to "Waiting on Author".

Thanks for all your comments, and sorry for the delayed response.
Please find attached a new version of the patch set, that is rebased
and contains the requested changes:

1/3:
Docs:
- on which the COPY command is executed
+ on which the COPY command is being executed
Reworded existing commment:
- /* Increment amount of processed tuples and update the progress */
+ /* Increment the number of processed tuples, and report the progress */

2/3:
Docs:
- ... report its progress to ...
+ ... report its progress in ...
- report its progress to the >pg_stat_progress_cluster< ...
+ report its progress in the >pg_stat_progress_cluster< view ...

3/3:
No changes

I believe that that was the extent of the not-yet-resolved comments
and suggestions.


With regards,

Matthias van de Meent.
From 7831549452b6d95c3db9060333750256675ff322 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 13:06:45 +0100
Subject: [PATCH v11 3/3] Add copy progress reporting regression tests.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This tests some basic features of copy progress reporting.

Sadly, we can only request one snapshot of progress_reporting each
transaction / statement, so we can't (easily) get intermediate results without
each result requiring a seperate statement being run.

Author: Josef Šimánek, Matthias van de Meent
---
 src/test/regress/input/copy.source  | 60 +
 src/test/regress/output/copy.source | 47 ++
 2 files changed, 107 insertions(+)

diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..4f1cbc73d2 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -201,3 +201,63 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+--
+-- progress reporting
+--
+
+-- setup
+-- reuse employer datatype, that has a small sized data set
+
+create table progress_reporting (
+	name 		text,
+	age			int4,
+	location 	point,
+	salary 		int4,
+	manager 	name
+);
+
+-- Add a trigger to 'raise info' the contents of pg_stat_progress_copy. This
+-- allows us to test and verify the contents of this view, which would
+-- otherwise require a concurrent session checking that table.
+create function notice_after_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- We cannot expect 'pid' nor 'relid' to be consistent over runs due to
+	-- variance in system process ids, and concurrency in runs of tests.
+	-- Additionally, due to the usage of this test in pg_regress, the 'datid'
+	-- also is not consistent between runs.
+	select into report (to_jsonb(r) - '{pid,relid,datid}'::text[]) as value
+		from pg_stat_progress_copy r
+		where pid = pg_backend_pid();
+
+	raise info 'progress: %', report.value::text;
+
+	RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+create trigger check_after_progress_reporting
+	after insert on progress_reporting
+	for each statement
+	execute function notice_after_progress_reporting();
+
+-- reporting of STDIO imports, and correct bytes-processed/tuples-processed reporting
+
+copy progress_reporting from stdin;
+sharon	25	(15,12)	1000	sam
+sam	30	(10,5)	2000	bill
+bill	20	(11,10)	1000	sharon
+\.
+
+-- reporting of FILE imports, and correct reporting of tuples-excluded
+
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
+	where (salary < 2000);
+
+-- cleanup progress_reporting
+
+drop trigger check_after_progress_reporting on progress_reporting;
+drop function notice_after_progress_reporting();
+drop table progress_reporting;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d3551da..f3596bdd15 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -165,3 +165,50 @@ select * from parted_copytest where b = 2;
 (1 row)
 
 drop table parted_copytest;
+--
+-- progress reporting
+--
+-- setup
+-- reuse employer datatype, that has a small sized data set
+create table progress_reporting (
+	name 		text,
+	age			int4,
+	location 	point,
+	salary 		int4,
+	manager 	name
+);
+-- Add a trigger to 'raise info' the contents of pg_stat_progress_copy. This
+-- allows us to test and verify the contents of this view, which would
+-- otherwise require a concurrent session checking that table.
+create function notice_after_progress_reporting() returns trigger AS
+$$
+declare report record;

Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Michael Paquier
On Thu, Mar 04, 2021 at 12:16:17PM +0530, Bharath Rupireddy wrote:
> IMO, the phrasing proposed by Justin upthread looks good. It's like this:
> 
> > +Each backend running ANALYZE will report its 
> > progress in
> > +the pg_stat_progress_analyze view. See

No objections to just go with that.  As a new patch set is needed, I
am switching the CF entry to "Waiting on Author".
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-03-03 Thread Bharath Rupireddy
On Wed, Feb 24, 2021 at 1:23 PM Justin Pryzby  wrote:
>
> On Sun, Feb 21, 2021 at 08:10:09PM +0100, Matthias van de Meent wrote:
> > Subject: [PATCH v9 1/3] Add progress-reported components for COPY progress
> >  reporting
>
> >   /* Increment amount of processed tuples and update 
> > the progress */
> >   /* Increment amount of processed tuples and update the progress */
>
> Ideally, this would say "number of processed tuples"

Correct. It's introduced by the original COPY progress reporting
patch. Having said that, we could just correct them in the 0001 patch.

> Looking at the existing docs:
>
> https://www.postgresql.org/docs/devel/progress-reporting.html#COPY-PROGRESS-REPORTING
> | OID of the table on which the COPY command is executed
>
> Maybe it should say ".. is executing".  Or ".. being executed":
> | OID of the table on which the COPY command is being executed

+1 for changing it to "OID of the table on which the COPY command is
being executed."

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-03-03 Thread Bharath Rupireddy
On Thu, Feb 25, 2021 at 1:44 PM Michael Paquier  wrote:
>
> On Wed, Feb 24, 2021 at 01:53:03AM -0600, Justin Pryzby wrote:
> > On Sun, Feb 21, 2021 at 08:10:09PM +0100, Matthias van de Meent wrote:
> > I think these should say that they report their progress *in* the view (not
> > "to"):
> >
> > > +Each backend running ANALYZE will report its 
> > > progress to
> > > +the pg_stat_progress_analyze view. See
>
> What is proposed in the patch is:
> "Each backend running blah will report its progress to the
> pg_stat_progress_blah view."
>
> What you propose is:
> "Each backend running blah will report its progress in the
> pg_stat_progress_blah view."

IMO, the phrasing proposed by Justin upthread looks good. It's like this:

> +Each backend running ANALYZE will report its progress 
> in
> +the pg_stat_progress_analyze view. See

> +Each backend running CREATE INDEX will report its
> +progress in the pg_stat_progress_create_index 
> view

...
...

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-25 Thread Michael Paquier
On Wed, Feb 24, 2021 at 01:53:03AM -0600, Justin Pryzby wrote:
> On Sun, Feb 21, 2021 at 08:10:09PM +0100, Matthias van de Meent wrote:
> I think these should say that they report their progress *in* the view (not
> "to"):
> 
> > +Each backend running ANALYZE will report its 
> > progress to
> > +the pg_stat_progress_analyze view. See

What is proposed in the patch is:
"Each backend running blah will report its progress to the
pg_stat_progress_blah view."

What you propose is:
"Each backend running blah will report its progress in the
pg_stat_progress_blah view."

What pg_basebackup tells is:
"Whenever pg_basebackup is taking a base
backup, the server's pg_stat_progress_basebackup view will report the
progress of the backup."

Here is an extra idea:
"Whenever a backend runs blah, the server's
pg_stat_progress_blah will report the progress of this command."
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-02-23 Thread Justin Pryzby
On Sun, Feb 21, 2021 at 08:10:09PM +0100, Matthias van de Meent wrote:
> Subject: [PATCH v9 1/3] Add progress-reported components for COPY progress
>  reporting

>   /* Increment amount of processed tuples and update the 
> progress */
>   /* Increment amount of processed tuples and update the progress */

Ideally, this would say "number of processed tuples"

> Subject: [PATCH v9 2/3] Add backlinks to progress reporting documentation

I think these should say that they report their progress *in* the view (not
"to"):

> +Each backend running ANALYZE will report its progress 
> to
> +the pg_stat_progress_analyze view. See

> +Each backend running CLUSTER will report its progress 
> to
> +the pg_stat_progress_cluster view. See

> +Each backend running COPY will report its progress to
> +the pg_stat_progress_copy view. See

> +Each backend running CREATE INDEX will report its
> +progress to the pg_stat_progress_create_index

> +Each backend running REINDEX will report its progress
> +to the pg_stat_progress_create_index view. See

Like this one:

> +Each backend running VACUUM without the
> +FULL option will report its progress in the

I'm sorry I didn't include that in last week's message.  You could also write:

|"The progress of each backend running >ANALYZE< is reported in the 
>pg_stat_progress_analyze< view."

Looking at the existing docs:

https://www.postgresql.org/docs/devel/progress-reporting.html#COPY-PROGRESS-REPORTING
| OID of the table on which the COPY command is executed

Maybe it should say ".. is executing".  Or ".. being executed":
| OID of the table on which the COPY command is being executed

-- 
Justin




Re: Improvements and additions to COPY progress reporting

2021-02-23 Thread Michael Paquier
On Tue, Feb 23, 2021 at 10:27:24AM +0100, Matthias van de Meent wrote:
> Note, I'm happy to be proven wrong here, in which case I don't
> disagree, but according to my limited knowledge, these outputs should
> be stable.

I am planning to look more at 0001 and 0003, but for now I have been
looking at 0002 which is interesting on its own.

+pg_stat_progress_vacuum view. Backends running
+VACUUM with the FULL option report
+progress in the pg_stat_progress_cluster instead.
You have missed one "view" after pg_stat_progress_cluster here.
Except that, this stuff looks fine.  So I'd like to apply it if there
are no objections.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-02-23 Thread Bharath Rupireddy
On Tue, Feb 23, 2021 at 2:57 PM Matthias van de Meent
 wrote:
> On Mon, 22 Feb 2021 at 05:49, Bharath Rupireddy
>  wrote:
> >
> > On Mon, Feb 22, 2021 at 12:40 AM Matthias van de Meent
> >  wrote:
> > >
> > > On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
> > >  wrote:
> > > >
> > > >  For COPY TO the name "source_type" column and for COPY FROM the name
> > > > "destination_type" makes sense. To have a combined column name for
> > > > both, how about naming that column as "io_type"?
> > >
> > > Thank you, that's way better! PFA what I believe is a finalized
> > > patchset v9, utilizing io_type terminology instead of io_target.
> >
> > Thanks for the patches. I reviewed them.
> >
> > 0001 -  I think there's a bug. See COPY TO stdout doesn't print
> > io_type as "STDIO".
>
> Fixed in attached

Thanks.

> > 0003 - patch:
> > I'm doubtful if the "bytes_total": 79 i.e. test file size will be the
> > same across different platforms and file systems types, if true, then
> > the below tests will not be stable. Do we also want to exclude the
> > bytes_total from the output, just to be on the safer side? Thoughts?
>
> I'm fairly certain that input files of the regression tests are
> considered 'binary files' to the test framework and that contents
> don't change between different architectures or OSes. I also think
> that any POSIX-compliant file system would report anything but the
> size of the file contents, i.e. the size of the blob that is the file,
> and that is correctly reported here. Other than that, if bytes_total
> wouldn't be stable, then bytes_processed wouldn't make sense either.
>
> For STDIN / STDOUT you might also have a point (different input
> methods might have different length encodings for the specified
> input), but insofar that I understand the test framework and the
> expected git configurations, the tests run using UTF-8 / ascii only,
> with a single style of newlines[+]. Sadly, I cannot provide examples
> nor outputs for other test framework settings due to my lack of
> experience with running the tests with non-standard settings.
>
> Note, I'm happy to be proven wrong here, in which case I don't
> disagree, but according to my limited knowledge, these outputs should
> be stable.

I'm no expert in different OS architectures, but I see that the
patches are passing on cf bot where they get tested on Windows,
FreeBSD, Linux and macOS platforms.

I have no further comments on the v10 patch set. I tested the patches,
they work as expected. I will mark the cf entry as "Ready for
Committer".

Below are some snapshots from testing:

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+---+-+-+-+--+-
 1089927 | 13003 | postgres | 16384 | COPY FROM | FILE|
104660992 |   88898 | 12861112 |   0
 1089969 | 13003 | postgres | 16384 | COPY FROM | FILE|
76611584 |   88898 |  9712999 |   0

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+---+-+-+-+--+-
 1089927 | 13003 | postgres | 16384 | COPY FROM | FILE|
203161600 |   88898 |0 |23804080
 1089969 | 13003 | postgres | 16384 | COPY FROM | FILE|
150601728 |   88898 |0 |17961241


 postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+-+-+-+-+--+-
 1089927 | 13003 | postgres | 16384 | COPY TO | FILE|
66806479 |   0 |  7422942 |   0
 1089969 | 13003 | postgres | 16384 | COPY TO | FILE|
29803951 |   0 |  3311550 |   0

 postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+-+-+-+-+--+-
 1089927 | 13003 | postgres | 16384 | COPY TO | STDIO   |
5998293 |   0 |   666477 |   0
 1089969 | 13003 | postgres | 16384 | COPY TO | STDIO   |
2780586 |   0 |   308954 |   0

 postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded

Re: Improvements and additions to COPY progress reporting

2021-02-23 Thread Matthias van de Meent
On Mon, 22 Feb 2021 at 05:49, Bharath Rupireddy
 wrote:
>
> On Mon, Feb 22, 2021 at 12:40 AM Matthias van de Meent
>  wrote:
> >
> > On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
> >  wrote:
> > >
> > >  For COPY TO the name "source_type" column and for COPY FROM the name
> > > "destination_type" makes sense. To have a combined column name for
> > > both, how about naming that column as "io_type"?
> >
> > Thank you, that's way better! PFA what I believe is a finalized
> > patchset v9, utilizing io_type terminology instead of io_target.
>
> Thanks for the patches. I reviewed them.
>
> 0001 -  I think there's a bug. See COPY TO stdout doesn't print
> io_type as "STDIO".

Fixed in attached

> 0003 - patch:
> I'm doubtful if the "bytes_total": 79 i.e. test file size will be the
> same across different platforms and file systems types, if true, then
> the below tests will not be stable. Do we also want to exclude the
> bytes_total from the output, just to be on the safer side? Thoughts?

I'm fairly certain that input files of the regression tests are
considered 'binary files' to the test framework and that contents
don't change between different architectures or OSes. I also think
that any POSIX-compliant file system would report anything but the
size of the file contents, i.e. the size of the blob that is the file,
and that is correctly reported here. Other than that, if bytes_total
wouldn't be stable, then bytes_processed wouldn't make sense either.

For STDIN / STDOUT you might also have a point (different input
methods might have different length encodings for the specified
input), but insofar that I understand the test framework and the
expected git configurations, the tests run using UTF-8 / ascii only,
with a single style of newlines[+]. Sadly, I cannot provide examples
nor outputs for other test framework settings due to my lack of
experience with running the tests with non-standard settings.

Note, I'm happy to be proven wrong here, in which case I don't
disagree, but according to my limited knowledge, these outputs should
be stable.


With regards,

Matthias van de Meent

[+] Except when explicitly configured to run using non-standard
configurations, in which case there are different expected output
values to be configured for that configuration.
From 3729dd30ac10af14b056db6c41d44f8c6b39b444 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 12 Feb 2021 14:06:44 +0100
Subject: [PATCH v10 1/3] Add progress-reported components for COPY progress
 reporting

The command, io target and excluded tuple count (by the
COPY ... FROM ... WHERE -clause) are now reported in the pg_stat_progress_copy
view. Additionally, the column lines_processed is renamed to tuples_processed
to disambiguate the meaning of this column in cases of CSV and BINARY copies
and to stay consistent with regards to names in the pg_stat_progress_*-views.

Of special interest is the reporting of io_type, with which we can
distinguish logical replications' initial table synchronization workers'
progress without having to join the pg_stat_activity view.
---
 doc/src/sgml/monitoring.sgml | 37 ++--
 src/backend/catalog/system_views.sql | 11 -
 src/backend/commands/copyfrom.c  | 27 +---
 src/backend/commands/copyto.c| 18 --
 src/include/commands/progress.h  | 15 ++-
 src/test/regress/expected/rules.out  | 15 ++-
 6 files changed, 113 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..a21bee69d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6546,6 +6546,29 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
  
 
+ 
+  
+   command text
+  
+  
+   The command that is running: COPY FROM, or
+   COPY TO.
+  
+ 
+
+ 
+  
+   io_type text
+  
+  
+   The io type that the data is read from or written to:
+   FILE, PROGRAM,
+   STDIO (for COPY FROM STDIN and
+   COPY TO STDOUT), or CALLBACK
+   (used in the table synchronization background worker).
+  
+ 
+
  
   
bytes_processed bigint
@@ -6567,10 +6590,20 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   lines_processed bigint
+   tuples_processed bigint
+  
+  
+   Number of tuples already processed by COPY command.
+  
+ 
+
+ 
+  
+   tuples_excluded bigint
   
   
-   Number of lines already processed by COPY command.
+   Number of tuples not processed because they were excluded by the
+   WHERE clause of the COPY command.
   
  
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..f96ad0a5dc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1129,9 +1129,18 @@ CREATE 

Re: Improvements and additions to COPY progress reporting

2021-02-21 Thread Bharath Rupireddy
On Mon, Feb 22, 2021 at 12:40 AM Matthias van de Meent
 wrote:
>
> On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
>  wrote:
> >
> >  For COPY TO the name "source_type" column and for COPY FROM the name
> > "destination_type" makes sense. To have a combined column name for
> > both, how about naming that column as "io_type"?
>
> Thank you, that's way better! PFA what I believe is a finalized
> patchset v9, utilizing io_type terminology instead of io_target.

Thanks for the patches. I reviewed them.

0001 -  I think there's a bug. See COPY TO stdout doesn't print
io_type as "STDIO".

postgres=# select * from pg_stat_progress_copy;
  pid   | datid | datname  | relid | command | io_type |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
+---+--+---+-+-+-+-+--+-
 977510 | 13003 | postgres | 16384 | COPY TO | |
23961591 |   0 |  2662399 |   0
(1 row)

we should do below(like we do for copyfrom.c):

@@ -702,7 +710,10 @@ BeginCopyTo(ParseState *pstate,
 if (pipe)
 {
progress_vals[1] = PROGRESS_COPY_IO_TYPE_STDIO;
 Assert(!is_program);/* the grammar does not allow this */
 if (whereToSendOutput != DestRemote)
 cstate->copy_file = stdout;
 }

Because if "pipe" is true, that means the transfer is between STDIO.
See below comment:

 * If  is false, transfer is between the table and the file named
 * .  Otherwise, transfer is between the table and our regular
 * input/output stream. The latter could be either stdin/stdout or a
 * socket, depending on whether we're running under Postmaster control.
 *

0002 patch looks good to me.

0003 - patch:
I'm doubtful if the "bytes_total": 79 i.e. test file size will be the
same across different platforms and file systems types, if true, then
the below tests will not be stable. Do we also want to exclude the
bytes_total from the output, just to be on the safer side? Thoughts?

copy progress_reporting from stdin;
+INFO:  progress: {"command": "COPY FROM", "datname": "regression",
"io_type": "STDIO", "bytes_total": 0, "bytes_processed": 79,
"tuples_excluded": 0, "tuples_processed": 3}
+-- reporting of FILE imports, and correct reporting of tuples-excluded
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
+where (salary < 2000);
+INFO:  progress: {"command": "COPY FROM", "datname": "regression",
"io_type": "FILE", "bytes_total": 79, "bytes_processed": 79,
"tuples_excluded": 1, "tuples_processed": 2}
+-- cleanup progress_reporting

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-21 Thread Matthias van de Meent
On Sat, 20 Feb 2021 at 07:09, Bharath Rupireddy
 wrote:
>
>  For COPY TO the name "source_type" column and for COPY FROM the name
> "destination_type" makes sense. To have a combined column name for
> both, how about naming that column as "io_type"?

Thank you, that's way better! PFA what I believe is a finalized
patchset v9, utilizing io_type terminology instead of io_target.


With regards,

Matthias van de Meent
From 5c0c9658e7c50a2a2d74a738b7bdbfcaa6362e24 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 12 Feb 2021 14:06:44 +0100
Subject: [PATCH v9 1/3] Add progress-reported components for COPY progress
 reporting

The command, io target and excluded tuple count (by the
COPY ... FROM ... WHERE -clause) are now reported in the pg_stat_progress_copy
view. Additionally, the column lines_processed is renamed to tuples_processed
to disambiguate the meaning of this column in cases of CSV and BINARY copies
and to stay consistent with regards to names in the pg_stat_progress_*-views.

Of special interest is the reporting of io_type, with which we can
distinguish logical replications' initial table synchronization workers'
progress without having to join the pg_stat_activity view.
---
 doc/src/sgml/monitoring.sgml | 37 ++--
 src/backend/catalog/system_views.sql | 11 -
 src/backend/commands/copyfrom.c  | 27 +---
 src/backend/commands/copyto.c| 19 --
 src/include/commands/progress.h  | 15 ++-
 src/test/regress/expected/rules.out  | 15 ++-
 6 files changed, 114 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..a21bee69d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6546,6 +6546,29 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
  
 
+ 
+  
+   command text
+  
+  
+   The command that is running: COPY FROM, or
+   COPY TO.
+  
+ 
+
+ 
+  
+   io_type text
+  
+  
+   The io type that the data is read from or written to:
+   FILE, PROGRAM,
+   STDIO (for COPY FROM STDIN and
+   COPY TO STDOUT), or CALLBACK
+   (used in the table synchronization background worker).
+  
+ 
+
  
   
bytes_processed bigint
@@ -6567,10 +6590,20 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   lines_processed bigint
+   tuples_processed bigint
+  
+  
+   Number of tuples already processed by COPY command.
+  
+ 
+
+ 
+  
+   tuples_excluded bigint
   
   
-   Number of lines already processed by COPY command.
+   Number of tuples not processed because they were excluded by the
+   WHERE clause of the COPY command.
   
  
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..f96ad0a5dc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1129,9 +1129,18 @@ CREATE VIEW pg_stat_progress_copy AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
 S.relid AS relid,
+CASE S.param5 WHEN 1 THEN 'COPY FROM'
+  WHEN 2 THEN 'COPY TO'
+  END AS command,
+CASE S.param6 WHEN 1 THEN 'FILE'
+  WHEN 2 THEN 'PROGRAM'
+  WHEN 3 THEN 'STDIO'
+  WHEN 4 THEN 'CALLBACK'
+  END AS io_type,
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
-S.param3 AS lines_processed
+S.param3 AS tuples_processed,
+S.param4 AS tuples_excluded
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b3f7..36828246c1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -539,7 +539,8 @@ CopyFrom(CopyFromState cstate)
 	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
-	uint64		processed = 0;
+	int64		processed = 0;
+	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -869,7 +870,11 @@ CopyFrom(CopyFromState cstate)
 			econtext->ecxt_scantuple = myslot;
 			/* Skip items that don't match COPY's WHERE clause */
 			if (!ExecQual(cstate->qualexpr, econtext))
+			{
+/* Report that this tuple was filtered out by the WHERE clause */
+pgstat_progress_update_param(PROGRESS_COPY_TUPLES_EXCLUDED, ++excluded);
 continue;
+			}
 		}
 
 		/* Determine the partition to insert the tuple into */
@@ -1107,7 +1112,7 @@ CopyFrom(CopyFromState cstate)
 			 * for counting tuples inserted by an INSERT 

Re: Improvements and additions to COPY progress reporting

2021-02-20 Thread Michael Paquier
On Sat, Feb 20, 2021 at 02:29:44PM +0530, Bharath Rupireddy wrote:
> Yeah. We could use pgstat_progress_update_multi_param instead of
> pgstat_progress_update_param to update multiple params.
> 
> On a quick scan through the code, I found that we can do the following. If
> okay, I can start a new thread so that we don't divert the main thread
> here. Thoughts?

Having a separate thread to discuss this part would be right.  This
way any patches sent would attract the correct audience.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-02-20 Thread Josef Šimánek
so 20. 2. 2021 v 7:09 odesílatel Bharath Rupireddy
 napsal:
>
> On Fri, Feb 19, 2021 at 2:34 AM Tomas Vondra
>  wrote:
> > > On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
> > >  wrote:
> > >>
> > >> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> > >> we do this in our codebase.
> > >
> > > I saw this being used in (re)index progress reporting, that's where I
> > > took inspiration from. It has been fixed in the attached version.
> > >
> >
> > Hmmm, good point. I haven't looked at the other places reporting
> > progress and I only ever saw this pattern in old code. I kinda dislike
> > these blocks, but admittedly that's rather subjective view. So if other
> > similar places do this when reporting progress, this probably should
> > too. What's your opinion on this?
>
> Actually in the code base the style of that variable declaration and
> usage of pgstat_progress_update_multi_param is a mix. For instance, in
> lazy_scan_heap, ReindexRelationConcurrently, the variables are
> declared at the start of the function. And in _bt_spools_heapscan,
> index_build, validate_index, perform_base_backup, the variables are
> declared within a separate block.
>
> IMO, we can have the arrays declared at the start of the functions
> i.e. the way it's done in v8-0001, because we can extend them for
> reporting some other parameter(maybe in future).
>
> > >> - I fir the "io_target" name misleading, because in some cases it's
> > >> actually the *source*.
> > >
> > > Yes, I was also not quite happy with this, but couldn't find a better
> > > one at the point of writing the initial patchset. Would
> > > "io_operations", "io_port", "operates_through" or "through" maybe be
> > > better?
> > >
> >
> > No idea. Let's see if someone has a better proposal ...
>
>  For COPY TO the name "source_type" column and for COPY FROM the name
> "destination_type" makes sense. To have a combined column name for
> both, how about naming that column as "io_type"?

+1 on "io_type", that is my best candidate as well

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-20 Thread Bharath Rupireddy
On Sat, Feb 20, 2021 at 12:49 PM Michael Paquier 
wrote:
>
> On Sat, Feb 20, 2021 at 11:39:22AM +0530, Bharath Rupireddy wrote:
> > Actually in the code base the style of that variable declaration and
> > usage of pgstat_progress_update_multi_param is a mix. For instance, in
> > lazy_scan_heap, ReindexRelationConcurrently, the variables are
> > declared at the start of the function. And in _bt_spools_heapscan,
> > index_build, validate_index, perform_base_backup, the variables are
> > declared within a separate block.
>
> I think that we should encourage the use of
> pgstat_progress_update_multi_param() where we can, as it makes
> consistent the updates to all the parameters according to
> st_changecount.  That's also usually cleaner to store all the
> parameters that are changed if these are updated multiple times like
> the REINDEX CONCURRENTLY ones.  The context of the code also matters,
> of course.

Yeah. We could use pgstat_progress_update_multi_param instead of
pgstat_progress_update_param to update multiple params.

On a quick scan through the code, I found that we can do the following. If
okay, I can start a new thread so that we don't divert the main thread
here. Thoughts?

@@ -3686,12 +3686,18 @@ reindex_index(Oid indexId, bool
skip_constraint_checks, char persistence,
  if (progress)
 {
+const intprogress_cols[] = {
+PROGRESS_CREATEIDX_COMMAND,
+PROGRESS_CREATEIDX_INDEX_OID
+};
+const int64progress_vals[] = {
+PROGRESS_CREATEIDX_COMMAND_REINDEX,
+indexId
+};
+
 pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
   heapId);
-pgstat_progress_update_param(PROGRESS_CREATEIDX_COMMAND,
- PROGRESS_CREATEIDX_COMMAND_REINDEX);
-pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
- indexId);
+pgstat_progress_update_multi_param(2, progress_cols,
progress_vals);
 }
@@ -1457,10 +1457,21 @@ DefineIndex(Oid relationId,
 set_indexsafe_procflags();
 /*
- * The index is now visible, so we can report the OID.
+ * The index is now visible, so we can report the OID. And also, report
+ * Phase 2 of concurrent index build.
  */
-pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
- indexRelationId);
+{
+const intprogress_cols[] = {
+PROGRESS_CREATEIDX_INDEX_OID,
+PROGRESS_CREATEIDX_PHASE
+};
+const int64progress_vals[] = {
+indexRelationId,
+PROGRESS_CREATEIDX_PHASE_WAIT_1
+};
+
+pgstat_progress_update_multi_param(2, progress_cols,
progress_vals);
+}
@@ -284,12 +284,9 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams
*params)
 CHECK_FOR_INTERRUPTS();
  pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-if (OidIsValid(indexOid))
-pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
- PROGRESS_CLUSTER_COMMAND_CLUSTER);
-else
-pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
- PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+ OidIsValid(indexOid) ?
PROGRESS_CLUSTER_COMMAND_CLUSTER :
+ PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Re: Improvements and additions to COPY progress reporting

2021-02-19 Thread Michael Paquier
On Sat, Feb 20, 2021 at 11:39:22AM +0530, Bharath Rupireddy wrote:
> Actually in the code base the style of that variable declaration and
> usage of pgstat_progress_update_multi_param is a mix. For instance, in
> lazy_scan_heap, ReindexRelationConcurrently, the variables are
> declared at the start of the function. And in _bt_spools_heapscan,
> index_build, validate_index, perform_base_backup, the variables are
> declared within a separate block.

I think that we should encourage the use of
pgstat_progress_update_multi_param() where we can, as it makes
consistent the updates to all the parameters according to
st_changecount.  That's also usually cleaner to store all the
parameters that are changed if these are updated multiple times like
the REINDEX CONCURRENTLY ones.  The context of the code also matters,
of course.

Scanning through the patch set, 0002 is a good idea taken
independently.
--
Michael


signature.asc
Description: PGP signature


Re: Improvements and additions to COPY progress reporting

2021-02-19 Thread Bharath Rupireddy
On Fri, Feb 19, 2021 at 2:34 AM Tomas Vondra
 wrote:
> > On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
> >  wrote:
> >>
> >> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> >> we do this in our codebase.
> >
> > I saw this being used in (re)index progress reporting, that's where I
> > took inspiration from. It has been fixed in the attached version.
> >
>
> Hmmm, good point. I haven't looked at the other places reporting
> progress and I only ever saw this pattern in old code. I kinda dislike
> these blocks, but admittedly that's rather subjective view. So if other
> similar places do this when reporting progress, this probably should
> too. What's your opinion on this?

Actually in the code base the style of that variable declaration and
usage of pgstat_progress_update_multi_param is a mix. For instance, in
lazy_scan_heap, ReindexRelationConcurrently, the variables are
declared at the start of the function. And in _bt_spools_heapscan,
index_build, validate_index, perform_base_backup, the variables are
declared within a separate block.

IMO, we can have the arrays declared at the start of the functions
i.e. the way it's done in v8-0001, because we can extend them for
reporting some other parameter(maybe in future).

> >> - I fir the "io_target" name misleading, because in some cases it's
> >> actually the *source*.
> >
> > Yes, I was also not quite happy with this, but couldn't find a better
> > one at the point of writing the initial patchset. Would
> > "io_operations", "io_port", "operates_through" or "through" maybe be
> > better?
> >
>
> No idea. Let's see if someone has a better proposal ...

 For COPY TO the name "source_type" column and for COPY FROM the name
"destination_type" makes sense. To have a combined column name for
both, how about naming that column as "io_type"?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-18 Thread Tomas Vondra



On 2/18/21 4:46 PM, Matthias van de Meent wrote:
> Hi,
> 
> Thank you all for the suggestions. PFA version 8 of the patchset, in
> which I have applied most of your comments. Unless explicitly named
> below, I have applied the suggestions.
> 

Thanks.

> 
> On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
>  wrote:
>>
>> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
>> we do this in our codebase.
> 
> I saw this being used in (re)index progress reporting, that's where I
> took inspiration from. It has been fixed in the attached version.
> 

Hmmm, good point. I haven't looked at the other places reporting
progress and I only ever saw this pattern in old code. I kinda dislike
these blocks, but admittedly that's rather subjective view. So if other
similar places do this when reporting progress, this probably should
too. What's your opinion on this?

>> - I fir the "io_target" name misleading, because in some cases it's
>> actually the *source*.
> 
> Yes, I was also not quite happy with this, but couldn't find a better
> one at the point of writing the initial patchset. Would
> "io_operations", "io_port", "operates_through" or "through" maybe be
> better?
> 

No idea. Let's see if someone has a better proposal ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Improvements and additions to COPY progress reporting

2021-02-18 Thread Matthias van de Meent
Hi,

Thank you all for the suggestions. PFA version 8 of the patchset, in
which I have applied most of your comments. Unless explicitly named
below, I have applied the suggestions.


On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
 wrote:
>
> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> we do this in our codebase.

I saw this being used in (re)index progress reporting, that's where I
took inspiration from. It has been fixed in the attached version.

> - I fir the "io_target" name misleading, because in some cases it's
> actually the *source*.

Yes, I was also not quite happy with this, but couldn't find a better
one at the point of writing the initial patchset. Would
"io_operations", "io_port", "operates_through" or "through" maybe be
better?


With regards,

Matthias van de Meent
From 40fade23e4a5c4ccf072d034b6f9b22e2fdfcfa5 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 12 Feb 2021 14:06:44 +0100
Subject: [PATCH v8 1/3] Add progress-reported components for COPY progress
 reporting

The command, io target and excluded tuple count (by the
COPY ... FROM ... WHERE -clause) are now reported in the pg_stat_progress_copy
view. Additionally, the column lines_processed is renamed to tuples_processed
to disambiguate the meaning of this column in cases of CSV and BINARY copies
and to stay consistent with regards to names in the pg_stat_progress_*-views.

Of special interest is the reporting of io_target, with which we can
distinguish logical replications' initial table synchronization workers'
progress without having to join the pg_stat_activity view.
---
 doc/src/sgml/monitoring.sgml | 37 ++--
 src/backend/catalog/system_views.sql | 11 -
 src/backend/commands/copyfrom.c  | 27 +---
 src/backend/commands/copyto.c| 19 --
 src/include/commands/progress.h  | 15 ++-
 src/test/regress/expected/rules.out  | 15 ++-
 6 files changed, 114 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..35cc938b20 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6544,6 +6544,29 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
  
 
+ 
+  
+   command text
+  
+  
+   The command that is running: COPY FROM, or
+   COPY TO.
+  
+ 
+
+ 
+  
+   io_target text
+  
+  
+   The io target that the data is read from or written to:
+   FILE, PROGRAM,
+   STDIO (for COPY FROM STDIN and
+   COPY TO STDOUT), or CALLBACK
+   (used in the table synchronization background worker).
+  
+ 
+
  
   
bytes_processed bigint
@@ -6565,10 +6588,20 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   lines_processed bigint
+   tuples_processed bigint
+  
+  
+   Number of tuples already processed by COPY command.
+  
+ 
+
+ 
+  
+   tuples_excluded bigint
   
   
-   Number of lines already processed by COPY command.
+   Number of tuples not processed because they were excluded by the
+   WHERE clause of the COPY command.
   
  
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..6a3ac47b85 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1129,9 +1129,18 @@ CREATE VIEW pg_stat_progress_copy AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
 S.relid AS relid,
+CASE S.param5 WHEN 1 THEN 'COPY FROM'
+  WHEN 2 THEN 'COPY TO'
+  END AS command,
+CASE S.param6 WHEN 1 THEN 'FILE'
+  WHEN 2 THEN 'PROGRAM'
+  WHEN 3 THEN 'STDIO'
+  WHEN 4 THEN 'CALLBACK'
+  END AS io_target,
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
-S.param3 AS lines_processed
+S.param3 AS tuples_processed,
+S.param4 AS tuples_excluded
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b3f7..b74b43ab84 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -539,7 +539,8 @@ CopyFrom(CopyFromState cstate)
 	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
-	uint64		processed = 0;
+	int64		processed = 0;
+	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -869,7 +870,11 @@ CopyFrom(CopyFromState cstate)
 			econtext->ecxt_scantuple = myslot;
 			/* Skip items that don't match COPY's WHERE 

Re: Improvements and additions to COPY progress reporting

2021-02-15 Thread Tomas Vondra
Hi,

I agree with these changes in general - I have a couple minor comment:

1) 0001

- the SGML docs are missing a couple tags

- The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
we do this in our codebase. Move the variable declarations to the
beginning, get rid of the out block. Or something like that.

- I fir the "io_target" name misleading, because in some cases it's
actually the *source*.


2) 0002

- I believe "each backend ... reports its" (not theirs), right?

- This seems more like a generic docs improvement, not quite specific to
the COPY progress patch. It's a bit buried, maybe it should be posted
separately. OTOH it's pretty small.


3) 0003

- Some whitespace noise, triggering "git am" warnings.

- Might be good to briefly explain what the regression test does with
the triggers, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 32e8bf3e84d58b43c8ab5888eb909e86bdf0ceea Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 12 Feb 2021 14:06:44 +0100
Subject: [PATCH 1/6] Add progress-reported components for COPY progress
 reporting

The command, io target and excluded tuple count (by COPY ... FROM ... s'
WHERE -clause) are now reported in the pg_stat_progress_copy view.
Additionally, the column lines_processed is renamed to tuples_processed to
disambiguate the meaning of this column in cases of CSV and BINARY copies and
to stay consistent with regards to names in the pg_stat_progress_*-views.

Of special interest is the reporting of io_target, with which we can
distinguish logical replications' initial table synchronization workers'
progress without having to join the pg_stat_activity view.
---
 doc/src/sgml/monitoring.sgml | 37 ++--
 src/backend/catalog/system_views.sql | 11 -
 src/backend/commands/copyfrom.c  | 30 +-
 src/backend/commands/copyto.c| 29 --
 src/include/commands/progress.h  | 15 ++-
 src/test/regress/expected/rules.out  | 15 ++-
 6 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..3c39c82f1a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6544,6 +6544,29 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
  
 
+ 
+  
+   command text
+  
+  
+   The command that is running: COPY FROM, or
+   COPY TO.
+  
+ 
+
+ 
+  
+   io_target text
+  
+  
+   The io target that the data is read from or written to: 
+   FILE, PROGRAM, 
+   STDIO (for COPY FROM STDIN and COPY TO STDOUT),
+   or CALLBACK (used in the table synchronization
+   background worker).
+  
+ 
+
  
   
bytes_processed bigint
@@ -6565,10 +6588,20 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   lines_processed bigint
+   tuples_processed bigint
+  
+  
+   Number of tuples already processed by COPY command.
+  
+ 
+
+ 
+  
+   tuples_excluded bigint
   
   
-   Number of lines already processed by COPY command.
+   Number of tuples not processed because they were excluded by the
+   WHERE clause of the COPY command.
   
  
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..6a3ac47b85 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1129,9 +1129,18 @@ CREATE VIEW pg_stat_progress_copy AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
 S.relid AS relid,
+CASE S.param5 WHEN 1 THEN 'COPY FROM'
+  WHEN 2 THEN 'COPY TO'
+  END AS command,
+CASE S.param6 WHEN 1 THEN 'FILE'
+  WHEN 2 THEN 'PROGRAM'
+  WHEN 3 THEN 'STDIO'
+  WHEN 4 THEN 'CALLBACK'
+  END AS io_target,
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
-S.param3 AS lines_processed
+S.param3 AS tuples_processed,
+S.param4 AS tuples_excluded
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b3f7..c3610eb67e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -540,6 +540,7 @@ CopyFrom(CopyFromState cstate)
 	CopyInsertMethod insertMethod;
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
+	uint64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -869,7 +870,11 @@ CopyFrom(CopyFromState cstate)
 			

Re: Improvements and additions to COPY progress reporting

2021-02-12 Thread Justin Pryzby
--- a/doc/src/sgml/ref/analyze.sgml 

   
+++ b/doc/src/sgml/ref/analyze.sgml 

   
@@ -273,6 +273,12 @@ ANALYZE [ VERBOSE ] [ table_and_columns  

   
+   

   
+ 

   
+Each backend running the ANALYZE command will report 
their   
  
+progress to the pg_stat_progress_analyze view.

   
+See  for details.  

   
+

   

I think this should say:

"..will report its progress to.."

Or:

"The progress of each backend running >ANALYZE< is reported in the
>pg_stat_progress_analyze< view."

-- 
Justin




Re: Improvements and additions to COPY progress reporting

2021-02-12 Thread Matthias van de Meent
On Fri, 12 Feb 2021 at 13:40, Bharath Rupireddy
 wrote:
>
> On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent
>  wrote:
> >
> > On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
> >  wrote:
> > >
> > > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
> > >  wrote:
> > > >
> > > >
> > > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  
> > > > wrote:
> > > >> I have split it since it should be the start of progress reporting
> > > >> testing at all. If you better consider this as part of COPY testing,
> > > >> feel free to move it to already existing copy testing related files.
> > > >> There's no real reason to keep it separated if not needed.
> > > >
> > > >
> > > > +1 to move those test cases to existing copy test files.
> > >
> > > Thanks for your reviews. PFA v4 of the patchset, in which the tests
> > > are put into copy.sql (well, input/copy.source). This also adds tests
> > > for correctly reporting COPY ... FROM 'file'.
> >
> > PFA v5, which fixes a failure in the pg_upgrade regression tests due
> > to incorrect usage of @abs_builddir@. I had the changes staged, but
> > forgot to add them to the patches.
> >
> > Sorry for the noise.
>
> Looks like the patch 0001 that was adding tuples_excluded was missing
> and cfbot is also not happy with the v5 patch set.
>
> Maybe, we may not need 6 patches as they are relatively very small
> patches. IMO, the following are enough:
>
> 0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO
> addition, io_target -- basically all the code related patches can go
> into 0001
> 0002 - documentation
> 0003 - tests - I think we can only have a simple test(in copy2.sql)
> showing stdin/stdout and not have file related tests. Because these
> patches work as expected, please find my testing below:

I agree with that split, the current split was mainly for the reason
that some of the patches (except 1, 3 and 6, which are quite
substantially different from the rest) each have had their seperate
concerns voiced about the changes contained in that patch (be it
direct or indirect); e.g. the renaming of lines_* to tuples_* is in my
opionion a good thing, and Josef disagrees.

Anyway, please find attached patchset v6 applying that split.

Regarding only a simple test: I believe it is useful to have at least
a test that distinguishes between two different import types. I've
made a mistake before, so I think it is useful to add a regression
tests to prevent someone else from making this same mistake (trivial
as it may be). Additionally, testing in copy.sql also allows for
validating the bytes_total column, which cannot be tested in copy2.sql
due to the lack of COPY FROM FILE -support over there. I'm +0.5 on
keeping it as-is in copy.sql, so unless someone has some strong
feelings about this, I'd like to keep it in copy.sql.

With regards,

Matthias van de Meent.
From daf793916a91157f917bafd669f50af3d3751ee4 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 5 Feb 2021 23:11:50 +0100
Subject: [PATCH v6 2/3] Add backlinks to progress reporting documentation

Previously, for most progress-reported features, the only place the
feature was mentioned is in the progress reporting document itself.
This makes the progress reporting more discoverable from the reported
commands.
---
 doc/src/sgml/ref/analyze.sgml   |  7 +++
 doc/src/sgml/ref/cluster.sgml   |  6 ++
 doc/src/sgml/ref/copy.sgml  | 14 ++
 doc/src/sgml/ref/create_index.sgml  |  8 
 doc/src/sgml/ref/pg_basebackup.sgml |  1 +
 doc/src/sgml/ref/reindex.sgml   |  7 +++
 doc/src/sgml/ref/vacuum.sgml| 11 +++
 7 files changed, 54 insertions(+)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 7d816c87c6..9db9070b62 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -273,6 +273,12 @@ ANALYZE [ VERBOSE ] [ table_and_columns
+
+  
+Each backend running the ANALYZE command will report their
+progress to the pg_stat_progress_analyze view.
+See  for details.
+  
  
 
  
@@ -291,6 +297,7 @@ ANALYZE [ VERBOSE ] [ table_and_columns


+   
   
  
 
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 5dd21a0189..5c2270f71b 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -192,6 +192,11 @@ CLUSTER [VERBOSE]
 are periodically reclustered.

 
+  
+Each backend running the CLUSTER command will report their
+progress to the pg_stat_progress_cluster view.
+See  for details.
+  
  
 
  
@@ -242,6 +247,7 @@ CLUSTER index_name ON 

+   
   
  
 
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 0fca6583af..af3ce72561 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -82,6 +82,12 @@ COPY { table_name [ ( 
+
+  
+Each backend running the COPY command will report their
+progress to the pg_stat_progress_copy view.
+See  for details.
+  
  

Re: Improvements and additions to COPY progress reporting

2021-02-12 Thread Bharath Rupireddy
On Fri, Feb 12, 2021 at 5:40 PM Matthias van de Meent
 wrote:
>
> On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
>  wrote:
> >
> > On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
> >  wrote:
> > >
> > >
> > > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  
> > > wrote:
> > >> I have split it since it should be the start of progress reporting
> > >> testing at all. If you better consider this as part of COPY testing,
> > >> feel free to move it to already existing copy testing related files.
> > >> There's no real reason to keep it separated if not needed.
> > >
> > >
> > > +1 to move those test cases to existing copy test files.
> >
> > Thanks for your reviews. PFA v4 of the patchset, in which the tests
> > are put into copy.sql (well, input/copy.source). This also adds tests
> > for correctly reporting COPY ... FROM 'file'.
>
> PFA v5, which fixes a failure in the pg_upgrade regression tests due
> to incorrect usage of @abs_builddir@. I had the changes staged, but
> forgot to add them to the patches.
>
> Sorry for the noise.

Looks like the patch 0001 that was adding tuples_excluded was missing
and cfbot is also not happy with the v5 patch set.

Maybe, we may not need 6 patches as they are relatively very small
patches. IMO, the following are enough:

0001 - tuples_excluded, lines to tuples change, COPY FROM/COPY TO
addition, io_target -- basically all the code related patches can go
into 0001
0002 - documentation
0003 - tests - I think we can only have a simple test(in copy2.sql)
showing stdin/stdout and not have file related tests. Because these
patches work as expected, please find my testing below:

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+---+---+-+-+--+-
 2886103 | 12977 | postgres | 16384 | COPY FROM | FILE  |
83099648 |8595 |  9553999 | 111
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+---+---+-+-+--+-
 2886103 | 12977 | postgres | 16384 | COPY FROM | STDIO |
 0 |   0 |0 |   0
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid | command | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+-+---+-+-+--+-
 2886103 | 12977 | postgres | 16384 | COPY TO | FILE  |
37771610 |   0 |  4999228 |   0
(1 row)

postgres=# select * from pg_stat_progress_copy;
   pid   | datid | datname  | relid |  command  | io_target |
bytes_processed | bytes_total | tuples_processed | tuples_excluded
-+---+--+---+---+---+-+-+--+-
 2892816 | 12977 | postgres | 16384 | COPY FROM | CALLBACK  |
249777823 |   0 | 3192 |   0
(1 row)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-12 Thread Matthias van de Meent
On Fri, 12 Feb 2021 at 12:23, Matthias van de Meent
 wrote:
>
> On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
>  wrote:
> >
> >
> > On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  wrote:
> >> I have split it since it should be the start of progress reporting
> >> testing at all. If you better consider this as part of COPY testing,
> >> feel free to move it to already existing copy testing related files.
> >> There's no real reason to keep it separated if not needed.
> >
> >
> > +1 to move those test cases to existing copy test files.
>
> Thanks for your reviews. PFA v4 of the patchset, in which the tests
> are put into copy.sql (well, input/copy.source). This also adds tests
> for correctly reporting COPY ... FROM 'file'.

PFA v5, which fixes a failure in the pg_upgrade regression tests due
to incorrect usage of @abs_builddir@. I had the changes staged, but
forgot to add them to the patches.

Sorry for the noise.

-Matthias
From 5727e7d6de2e0ffa9c5f319fb92c8eb35421f50b Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 12 Feb 2021 12:53:14 +0100
Subject: [PATCH v5 6/6] Add copy progress reporting regression tests.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This tests some basic features of copy progress reporting.

Sadly, we can only request one snapshot of progress_reporting each
transaction / statement, so we can't (easily) get intermediate results without
each result requiring a seperate statement being run.

Author: Josef Šimánek, Matthias van de Meent
---
 src/test/regress/input/copy.source  | 2 +-
 src/test/regress/output/copy.source | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 0ce267e1cc..ddde33e7cc 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -250,7 +250,7 @@ bill	20	(11,10)	1000	sharon
 
 -- reporting of FILE imports, and correct reporting of tuples-excluded
 
-copy progress_reporting from '@abs_builddir@/data/emp.data'
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
 	where (salary < 2000);
 
 -- cleanup progress_reporting
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 32edc60319..60f4206aa1 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -202,7 +202,7 @@ create trigger check_after_progress_reporting
 copy progress_reporting from stdin;
 INFO:  progress: {"command": "COPY FROM", "datname": "regression", "io_target": "STDIO", "bytes_total": 0, "bytes_processed": 79, "tuples_excluded": 0, "tuples_processed": 3}
 -- reporting of FILE imports, and correct reporting of tuples-excluded
-copy progress_reporting from '@abs_builddir@/data/emp.data'
+copy progress_reporting from '@abs_srcdir@/data/emp.data'
 	where (salary < 2000);
 INFO:  progress: {"command": "COPY FROM", "datname": "regression", "io_target": "FILE", "bytes_total": 79, "bytes_processed": 79, "tuples_excluded": 1, "tuples_processed": 2}
 -- cleanup progress_reporting
-- 
2.20.1

From 181e95a07d7f1d3bac08f1596622e6da2519d069 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 13:06:45 +0100
Subject: [PATCH v5 5/6] Add copy progress reporting regression tests.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This tests some basic features of copy progress reporting.

Sadly, we can only request one snapshot of progress_reporting each
transaction / statement, so we can't (easily) get intermediate results without
each result requiring a seperate statement being run.

Author: Josef Šimánek, Matthias van de Meent
---
 src/test/regress/input/copy.source  | 57 +
 src/test/regress/output/copy.source | 44 ++
 2 files changed, 101 insertions(+)

diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529ad36..0ce267e1cc 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -201,3 +201,60 @@ select * from parted_copytest where b = 1;
 select * from parted_copytest where b = 2;
 
 drop table parted_copytest;
+
+--
+-- progress reporting
+-- 
+
+-- setup
+-- reuse employer datatype, that has a small sized data set
+
+create table progress_reporting (
+	name 		text,
+	age			int4,
+	location 	point,
+	salary 		int4,
+	manager 	name
+);
+
+create function notice_after_progress_reporting() returns trigger AS
+$$
+declare report record;
+begin
+	-- We cannot expect 'pid' nor 'relid' to be consistent over runs due to 
+	-- variance in system process ids, and concurrency in runs of tests.
+	-- Additionally, due to the usage of this test in pg_regress, the 'datid'
+	-- also is not consistent between runs.
+	select into report (to_jsonb(r) - '{pid,relid,datid}'::text[]) as value
+		from pg_stat_progress_copy r
+		where pid = pg_backend_pid();
+
+	

Re: Improvements and additions to COPY progress reporting

2021-02-12 Thread Matthias van de Meent
On Thu, 11 Feb 2021 at 15:44, Bharath Rupireddy
 wrote:
>
>
> On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  wrote:
>> I have split it since it should be the start of progress reporting
>> testing at all. If you better consider this as part of COPY testing,
>> feel free to move it to already existing copy testing related files.
>> There's no real reason to keep it separated if not needed.
>
>
> +1 to move those test cases to existing copy test files.

Thanks for your reviews. PFA v4 of the patchset, in which the tests
are put into copy.sql (well, input/copy.source). This also adds tests
for correctly reporting COPY ... FROM 'file'.

I've changed the notice-alerted format from manually naming each
column to calling to_jsonb and removing the unstable columns from the
reported value; this should therefore be stable and give direct notice
to changes in the view.

With regards,

Matthias van de Meent.
From f7d761f6774753d4914d0dbc80effbb1ab09b58e Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 8 Feb 2021 17:36:00 +0100
Subject: [PATCH v4 4/6] Add a command column to the copy progress view

This allows filtering on COPY FROM / COPY TO for progress reporting, and makes
it possible to determine the further meaning of the columns involved.
---
 doc/src/sgml/monitoring.sgml | 10 ++
 src/backend/catalog/system_views.sql |  3 +++
 src/backend/commands/copyfrom.c  |  1 +
 src/backend/commands/copyto.c|  1 +
 src/include/commands/progress.h  |  5 +
 src/test/regress/expected/rules.out  |  5 +
 6 files changed, 25 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 940e9dcee4..ca84b53896 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6544,6 +6544,16 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
  
 
+ 
+  
+   command text
+  
+  
+   The command that is running: COPY FROM, or
+   COPY TO.
+  
+ 
+
  
   
bytes_processed bigint
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e7e227792c..1082b7d253 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1129,6 +1129,9 @@ CREATE VIEW pg_stat_progress_copy AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
 S.relid AS relid,
+CASE S.param5 WHEN 1 THEN 'COPY FROM'
+  WHEN 2 THEN 'COPY TO'
+  END AS command,
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
 S.param3 AS tuples_processed,
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fb3c7e2c0c..ce343dbf80 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1428,6 +1428,7 @@ BeginCopyFrom(ParseState *pstate,
 	/* initialize progress */
 	pgstat_progress_start_command(PROGRESS_COMMAND_COPY,
   cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+	pgstat_progress_update_param(PROGRESS_COPY_COMMAND, PROGRESS_COPY_COMMAND_FROM);
 	cstate->bytes_processed = 0;
 
 	/* We keep those variables in cstate. */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ffe7a6ee3..534c091c75 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -772,6 +772,7 @@ BeginCopyTo(ParseState *pstate,
 	/* initialize progress */
 	pgstat_progress_start_command(PROGRESS_COMMAND_COPY,
   cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
+	pgstat_progress_update_param(PROGRESS_COPY_COMMAND, PROGRESS_COPY_COMMAND_TO);
 	cstate->bytes_processed = 0;
 
 	MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 8b2b188bd5..1c30d09abb 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -138,5 +138,10 @@
 #define PROGRESS_COPY_BYTES_TOTAL 1
 #define PROGRESS_COPY_TUPLES_PROCESSED 2
 #define PROGRESS_COPY_TUPLES_EXCLUDED 3
+#define PROGRESS_COPY_COMMAND 4
+
+/* Commands of PROGRESS_COPY_COMMAND */
+#define PROGRESS_COPY_COMMAND_FROM 1
+#define PROGRESS_COPY_COMMAND_TO 2
 
 #endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 970f6909c2..63b5e33083 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1948,6 +1948,11 @@ pg_stat_progress_copy| SELECT s.pid,
 s.datid,
 d.datname,
 s.relid,
+CASE s.param5
+WHEN 1 THEN 'COPY FROM'::text
+WHEN 2 THEN 'COPY TO'::text
+ELSE NULL::text
+END AS command,
 s.param1 AS bytes_processed,
 s.param2 AS bytes_total,
 s.param3 AS tuples_processed,
-- 
2.20.1

From 181e95a07d7f1d3bac08f1596622e6da2519d069 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 13:06:45 +0100
Subject: 

Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Bharath Rupireddy
On Thu, Feb 11, 2021, 8:08 PM Josef Šimánek  wrote:

> čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
>  napsal:
> >
> > On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
> >  wrote:
> > >
> > > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> > >  wrote:
> > > > > Also, you can add this to the current commitfest.
> > > >
> > > > See https://commitfest.postgresql.org/32/2977/
> > > >
> > > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek 
> wrote:
> > > > >
> > > > > OK, would you mind to integrate my regression test initial patch as
> > > > > well in v3 or should I submit it later in a separate way?
> > > >
> > > > Attached, with minor fixes
> > >
> > > Why do we need to have a new test file progress.sql for the test
> > > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > > you have a plan to add test cases into progress.sql for other progress
> > > reporting commands?
> >
> > I don't mind moving the test into copy or copy2, but the main reason
> > to put it in a seperate file is to test the 'copy' component of the
> > feature called 'progress reporting'. If the feature instead is 'copy'
> > and 'progress reporting' is part of that feature, then I'd put it in
> > the copy-tests, but because the documentation of this has it's own
> > docs page  'progress reporting', and because 'copy' is a subsection of
> > that, I do think that this feature warrants its own regression test
> > file.
> >
> > There are no other tests for the progress reporting feature yet,
> > because COPY ... FROM is the only command that is progress reported
> > _and_ that can fire triggers while running the command, so checking
> > the progress view during the progress reported command is only
> > feasable in COPY progress reporting. To test the other progress
> > reporting views, we would need multiple sessions, which I believe is
> > impossible in this test format. Please correct me if I'm wrong; I'd
> > love to add tests for the other components. That will not be in this
> > patchset, though.
> >
> > > IMO, it's better not add any new test file but add the tests to
> existing files.
> >
> > In general I agree, but in some cases (e.g. new system component, new
> > full-fledged feature), new test files are needed. I think that this
> > could be one of those cases.
>
> I have split it since it should be the start of progress reporting
> testing at all. If you better consider this as part of COPY testing,
> feel free to move it to already existing copy testing related files.
> There's no real reason to keep it separated if not needed.
>

+1 to move those test cases to existing copy test files.


Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Josef Šimánek
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
 napsal:
>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
>  wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> >  wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  
> > > wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page  'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing 
> > files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.

I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.

>
> With regards,
>
> Matthias van de Meent




Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Matthias van de Meent
On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
 wrote:
>
> On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
>  wrote:
> > > Also, you can add this to the current commitfest.
> >
> > See https://commitfest.postgresql.org/32/2977/
> >
> > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  wrote:
> > >
> > > OK, would you mind to integrate my regression test initial patch as
> > > well in v3 or should I submit it later in a separate way?
> >
> > Attached, with minor fixes
>
> Why do we need to have a new test file progress.sql for the test
> cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> you have a plan to add test cases into progress.sql for other progress
> reporting commands?

I don't mind moving the test into copy or copy2, but the main reason
to put it in a seperate file is to test the 'copy' component of the
feature called 'progress reporting'. If the feature instead is 'copy'
and 'progress reporting' is part of that feature, then I'd put it in
the copy-tests, but because the documentation of this has it's own
docs page  'progress reporting', and because 'copy' is a subsection of
that, I do think that this feature warrants its own regression test
file.

There are no other tests for the progress reporting feature yet,
because COPY ... FROM is the only command that is progress reported
_and_ that can fire triggers while running the command, so checking
the progress view during the progress reported command is only
feasable in COPY progress reporting. To test the other progress
reporting views, we would need multiple sessions, which I believe is
impossible in this test format. Please correct me if I'm wrong; I'd
love to add tests for the other components. That will not be in this
patchset, though.

> IMO, it's better not add any new test file but add the tests to existing 
> files.

In general I agree, but in some cases (e.g. new system component, new
full-fledged feature), new test files are needed. I think that this
could be one of those cases.


With regards,

Matthias van de Meent




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Bharath Rupireddy
On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
 wrote:
> > Also, you can add this to the current commitfest.
>
> See https://commitfest.postgresql.org/32/2977/
>
> On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  wrote:
> >
> > OK, would you mind to integrate my regression test initial patch as
> > well in v3 or should I submit it later in a separate way?
>
> Attached, with minor fixes

Why do we need to have a new test file progress.sql for the test
cases? Can't we add them into existing copy.sql or copy2.sql? Or do
you have a plan to add test cases into progress.sql for other progress
reporting commands?

IMO, it's better not add any new test file but add the tests to existing files.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Matthias van de Meent
On Tue, 9 Feb 2021 at 08:12, Bharath Rupireddy
 wrote:
>
> On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent
>  wrote:
> > With [0] we got COPY progress reporting. Before the column names of
> > this newly added view are effectively set in stone with the release of
> > pg14, I propose the following set of relatively small patches. These
> > are v2, because it is a patchset that is based on a set of patches
> > that I previously posted in [0].
>
> Thanks for working on the patches. Here are some comments:
>
> 0001 - +1 to add tuples_excluded and the patch LGTM.
>
> 0002 - Yes, the tuples_processed or tuples_excluded makes more sense
> to me than lines_processed and lines_excluded. The patch LGTM.
>
> 0003 - Instead of just adding the progress reporting to "See also"
> sections in the footer of the respective pages analyze, cluster and
> others, it would be nice if we have a mention of it in the description
> as pg_basebackup has something like below:
>  
>Whenever pg_basebackup is taking a base
>backup, the server's pg_stat_progress_basebackup
>view will report the progress of the backup.
>See  for details.

Added

> 0004 -
> 1) How about PROGRESS_COPY_COMMAND_TYPE instead of
> PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing
> PROGRESS_COMMAND_COPY.

The current name is consistent with the naming of the other
command-reporting progress views; CREATEIDX and CLUSTER both use the
*_COMMAND as this column indexes' internal name.

> 0005 -
> 1) How about
> +   or CALLBACK (used in the table
> synchronization background
> +   worker).
> instead of
> +   or CALLBACK (used in the tablesync background
> +   worker).
> Because "table synchronization" is being used in logical-replication.sgml.

Fixed

> 2) I think cstate->copy_src = COPY_CALLBACK is assigned after the
> switch case added in copyfrom.c
> if (data_source_cb)
> {
> cstate->copy_src = COPY_CALLBACK;
> cstate->data_source_cb = data_source_cb;
> }

Yes, I noticed this too while working on the patchset, but apparently
didn't act on this... Fixed in attachted version.

> Also, you can add this to the current commitfest.

See https://commitfest.postgresql.org/32/2977/

On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  wrote:
>
> OK, would you mind to integrate my regression test initial patch as
> well in v3 or should I submit it later in a separate way?

Attached, with minor fixes


With regards,

Matthias van de Meent
From 720f58890c2910bba7829c380ccb5a022596a6c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 13:06:45 +0100
Subject: [PATCH v3 6/6] Add progress reporting regression tests

This tests some basic features of copy progress reporting, as this is
possible to implement in one session using triggers. Other views may follow.
---
 src/test/regress/expected/progress.out | 47 ++
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/progress.sql  | 43 +++
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/progress.out
 create mode 100644 src/test/regress/sql/progress.sql

diff --git a/src/test/regress/expected/progress.out b/src/test/regress/expected/progress.out
new file mode 100644
index 00..8feed9df42
--- /dev/null
+++ b/src/test/regress/expected/progress.out
@@ -0,0 +1,47 @@
+-- setup for COPY progress testing
+CREATE TEMP TABLE test_progress_with_trigger (
+  a int,
+  b text
+) ;
+CREATE function notice_after_progress_reporting() RETURNS trigger AS
+$$
+DECLARE report record;
+BEGIN
+  SELECT INTO report * FROM pg_stat_progress_copy report WHERE pid = pg_backend_pid();
+  raise info 'progress datname: %', report.datname;
+  raise info 'progress command: %', report.command;
+  raise info 'progress io_target: %', report.io_target;
+  raise info 'progress bytes_processed: %', report.bytes_processed;
+  raise info 'progress bytes_total: %', report.bytes_total;
+  raise info 'progress tuples_processed: %', report.tuples_processed;
+  raise info 'progress tuples_excluded: %', report.tuples_excluded;
+
+  RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+CREATE TRIGGER check_after_progress_reporting
+AFTER INSERT ON test_progress_with_trigger
+FOR EACH ROW
+EXECUTE FUNCTION notice_after_progress_reporting();
+-- simple COPY from STDIN
+COPY test_progress_with_trigger (a, b) FROM STDIN;
+INFO:  progress datname: regression
+INFO:  progress command: COPY FROM
+INFO:  progress io_target: STDIO
+INFO:  progress bytes_processed: 9
+INFO:  progress bytes_total: 0
+INFO:  progress tuples_processed: 1
+INFO:  progress tuples_excluded: 0
+-- COPY from STDIN with WHERE skipping lines
+COPY test_progress_with_trigger (a, b) FROM STDIN WHERE a > 1;
+INFO:  progress datname: regression
+INFO:  progress command: COPY FROM
+INFO:  progress io_target: STDIO
+INFO:  progress bytes_processed: 18
+INFO:  progress bytes_total: 0

Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread 0010203112132233
On Tue, 9 Feb 2021 at 09:32, Josef Šimánek  wrote:
>
> po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
>  napsal:
> > Lastly, 0005 adds 'io_target' to the reported information, that is,
> > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> > be determined based on the commands in pg_stat_activity, it is
> > reasonably something that a user would want to query on, as the
> > origin/target of COPY has security and performance implications,
> > whereas other options (e.g. format) are less interesting for clients
> > that are not executing that specific COPY command.
>
> I took a little deeper look and I'm not sure if I understand FILE and
> STDIO. I have finally tried to finalize some initial regress testing
> of COPY command progress using triggers. I have attached the initial
> patch  applicable to your changes. As you can see COPY FROM STDIN is
> reported as FILE. That's probably expected, but it is a little
> confusing for me since STDIN and STDIO sound similar. What is the
> purpose of STDIO? When is the COPY command reported with io_target of
> STDIO?

I checked for the type of the copy_src before it was correctly set,
therefore only reporting FILE type, but this will be fixed shortly in
v3.

Matthias




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Josef Šimánek
út 9. 2. 2021 v 12:51 odesílatel 0010203112132233  napsal:
>
> On Tue, 9 Feb 2021 at 09:32, Josef Šimánek  wrote:
> >
> > po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
> >  napsal:
> > > Lastly, 0005 adds 'io_target' to the reported information, that is,
> > > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> > > be determined based on the commands in pg_stat_activity, it is
> > > reasonably something that a user would want to query on, as the
> > > origin/target of COPY has security and performance implications,
> > > whereas other options (e.g. format) are less interesting for clients
> > > that are not executing that specific COPY command.
> >
> > I took a little deeper look and I'm not sure if I understand FILE and
> > STDIO. I have finally tried to finalize some initial regress testing
> > of COPY command progress using triggers. I have attached the initial
> > patch  applicable to your changes. As you can see COPY FROM STDIN is
> > reported as FILE. That's probably expected, but it is a little
> > confusing for me since STDIN and STDIO sound similar. What is the
> > purpose of STDIO? When is the COPY command reported with io_target of
> > STDIO?
>
> I checked for the type of the copy_src before it was correctly set,
> therefore only reporting FILE type, but this will be fixed shortly in
> v3.

OK, would you mind to integrate my regression test initial patch as
well in v3 or should I submit it later in a separate way?

> Matthias




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Josef Šimánek
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
 napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].
>
> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").
>
> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.
>
> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

I took a little deeper look and I'm not sure if I understand FILE and
STDIO. I have finally tried to finalize some initial regress testing
of COPY command progress using triggers. I have attached the initial
patch  applicable to your changes. As you can see COPY FROM STDIN is
reported as FILE. That's probably expected, but it is a little
confusing for me since STDIN and STDIO sound similar. What is the
purpose of STDIO? When is the COPY command reported with io_target of
STDIO?

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.
>
>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] 
> https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
From d91fbe3a80c1858e9cbc126e4ade44945cea9ae9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 09:27:21 +0100
Subject: [PATCH] Add initial regress test for COPY progress reporting.

---
 src/test/regress/expected/progress.out | 43 ++
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/progress.sql  | 39 +++
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/progress.out
 create mode 100644 src/test/regress/sql/progress.sql

diff --git a/src/test/regress/expected/progress.out b/src/test/regress/expected/progress.out
new file mode 100644
index ..2c745b946366
--- /dev/null
+++ b/src/test/regress/expected/progress.out
@@ -0,0 +1,43 @@
+-- setup for COPY progress testing
+CREATE TEMP TABLE test_progress_with_trigger (
+  a int,
+  b text
+) ;
+CREATE OR REPLACE function notice_after_progress_reporting() RETURNS trigger AS
+$$
+DECLARE report record;
+BEGIN
+  SELECT INTO report * FROM pg_stat_progress_copy report WHERE pid = pg_backend_pid();
+  raise info 'progress datname: %', report.datname;
+  raise info 'progress command: %', report.command;
+  

Re: Improvements and additions to COPY progress reporting

2021-02-08 Thread Bharath Rupireddy
On Tue, Feb 9, 2021 at 12:06 AM Matthias van de Meent
 wrote:
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].

Thanks for working on the patches. Here are some comments:

0001 - +1 to add tuples_excluded and the patch LGTM.

0002 - Yes, the tuples_processed or tuples_excluded makes more sense
to me than lines_processed and lines_excluded. The patch LGTM.

0003 - Instead of just adding the progress reporting to "See also"
sections in the footer of the respective pages analyze, cluster and
others, it would be nice if we have a mention of it in the description
as pg_basebackup has something like below:
 
   Whenever pg_basebackup is taking a base
   backup, the server's pg_stat_progress_basebackup
   view will report the progress of the backup.
   See  for details.

0004 -
1) How about PROGRESS_COPY_COMMAND_TYPE instead of
PROGRESS_COPY_COMMAND? The names looks bit confusing with the existing
PROGRESS_COMMAND_COPY.

0005 -
1) How about
+   or CALLBACK (used in the table
synchronization background
+   worker).
instead of
+   or CALLBACK (used in the tablesync background
+   worker).
Because "table synchronization" is being used in logical-replication.sgml.

2) I think cstate->copy_src = COPY_CALLBACK is assigned after the
switch case added in copyfrom.c
if (data_source_cb)
{
cstate->copy_src = COPY_CALLBACK;
cstate->data_source_cb = data_source_cb;
}

Also, you can add this to the current commitfest.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-08 Thread Josef Šimánek
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
 napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].

Hello. I had this in my backlog to revisit this feature as well before
the release. Thanks for picking this up.

> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").

As an mentioned author I have no preference over line or tuple
terminology here. For some cases "line" terminology fits better, for
some "tuple" one. Docs can be improved later if needed to make it
clear at some cases (for example in most common case probably - CSV
import/export) tuple equals one line in CSV.

> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.

This was part of my initial patch as well, but I decided to strip it
out to make the final patch as small as possible to make it quickly
mergeable without need of further discussion. From my side this is
useful to have directly in the progress report as well.

> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

Similar (simplified, not supporting CALLBACK) info was also part of
the initial patch and stripped out later. I'm also +1 on this info
being useful to have directly in the progress report.

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.

All patches seem good to me. I was able to apply them to current clean
master and "make check" has succeeded without problems.

>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] 
> https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com