Re: Improving worst-case merge join performance with often-null foreign key

2023-04-26 Thread Steinar Kaldager
On Sun, Apr 23, 2023 at 11:30 AM Richard Guo  wrote:
> On Sat, Apr 22, 2023 at 11:21 PM Tom Lane  wrote:
>> Hmm.  I don't entirely understand why the existing stop-at-nulls logic
>> in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
>> that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.
>
>
> I think it's just because the MergeJoin didn't see a NULL foo_id value
> from test_bar tuples because all such tuples are removed by the filter
> 'test_bar.active', thus it does not have a chance to stop at nulls.

Agreed, this is also my understanding.

Note that this isn't just a contrived test case, it's also the
situation we ran into in prod. (We had a table with a lot of old
inactive rows with null values for Historical Reasons, essentially
kept for accounting/archival purposes. Newer, active, rows all had the
foreign key column set to non-null.)

I had initially considered whether this could be fixed in the
merge-join execution code instead of by altering the plan, but at
first glance that feels like it might be a more awkward fit. It's easy
enough to stop the merge join if a null actually appears, but because
of the filter, no null will ever appear. You'd have to somehow break
the "stream of values" abstraction and look at where the values are
actually coming from and/or which values would have appeared if they
weren't filtered out. I don't know the codebase well, but to me that
feels fairly hacky compared to altering the plan for the index scan.

Steinar




Re: pg_recvlogical prints bogus error when interrupted

2023-04-26 Thread Michael Paquier
On Thu, Apr 27, 2023 at 11:24:52AM +0530, Bharath Rupireddy wrote:
> IMO, +1 for HEAD/PG16 and +0.5 for backpatching as it may not be so
> critical to backpatch all the way down. What may happen without this
> patch is that the output file isn't fsync-ed upon SIGINT/SIGTERM.
> Well, is it a critical issue on production servers?

It is also true that it's been this way for years with basically
nobody complaining outside this thread.  So there is also an argument
about leaving v16 out of the picture, and do that only in 17~ to avoid
playing with the branch stability more than necessary?  I see 7 open
items as of today, and there are TAP tests linked to pg_recvlogical.
That should be OK, because none of these tests rely on specific
signals, but the buildfarm has some weird setups, as well..
--
Michael


signature.asc
Description: PGP signature


Re: Autogenerate some wait events code and documentation

2023-04-26 Thread Michael Paquier
On Wed, Apr 26, 2023 at 08:36:44PM +0200, Drouvot, Bertrand wrote:
> Please find attached V5 addressing the previous comments except
> the "ordering" one (need to look deeper at this).

I was putting my hands into that, and I see now what you mean here..
Among the nine types of wait events, Lock, LWLock, BufferPin and
Extension don't get generated at all.

Generating the contents of Lock would mean to gather in a single file
the data for the generation of LockTagType in lock.h, the list of
LockTagTypeNames in lockfuncs.c and the description of the docs.  This
data being spread across three files is not really appealing to make
that generated..  LWLocks would mean to either extend lwlocknames.txt
with the description from the docs if we were to centralize the whole
thing.

But do we need to merge more data than necessary?  We could do things
in the simplest fashion possible while making the docs and code
user-friendly in the ordering: just add a section for Lock and LWLocks
in waiteventnames.txt with an extra comment in their headers and/or
data files to tell that waiteventnames.txt also needs a refresh.  I
would be tempted to do that, actually, and force an ordering for all
the wait event categories in generate-waiteventnames.pl with something
like that:
 # Generate the output files
-foreach $waitclass (keys %hashwe)
+foreach $waitclass (sort keys %hashwe)

BufferPin and Extension don't really imply an extra cost by the way:
they could just be added to the txt for the wait events even if they
have one single element for now.
--
Michael


signature.asc
Description: PGP signature


Re: pg_recvlogical prints bogus error when interrupted

2023-04-26 Thread Bharath Rupireddy
On Tue, Apr 11, 2023 at 11:42 AM Michael Paquier  wrote:
>
> On Mon, Oct 24, 2022 at 08:15:11AM +0530, Bharath Rupireddy wrote:
> > The attached patch (pg_recvlogical_graceful_interrupt.text) has a
> > couple of problems, I believe. We're losing prepareToTerminate() with
> > keepalive true and we're not skipping pg_log_error("unexpected
> > termination of replication stream: %s" upon interrupt, after all we're
> > here discussing how to avoid it.
> >
> > I came up with the attached v2 patch, please have a look.
>
> This thread has slipped through the feature freeze deadline.  Would
> people be OK to do something now on HEAD?  A backpatch is also in
> order, IMO, as the current behavior looks confusing under SIGINT and
> SIGTERM.

IMO, +1 for HEAD/PG16 and +0.5 for backpatching as it may not be so
critical to backpatch all the way down. What may happen without this
patch is that the output file isn't fsync-ed upon SIGINT/SIGTERM.
Well, is it a critical issue on production servers?

On Fri, Apr 7, 2023 at 5:12 AM Cary Huang  wrote:
>
> The following review has been posted through the commitfest application:
>
> The patch applies and tests fine. I like the way to have both ready_to_exit 
> and time_to_abort variables to control the exit sequence. I think the (void) 
> cast can be removed in front of PQputCopyEnd(), PQflush for consistency 
> purposes as it does not give warnings and everywhere else does not have those 
> casts.

Thanks for reviewing. I removed the (void) casts like elsewhere in the
code, however, I didn't change such casts in prepareToTerminate() to
not create a diff.

I'm attaching the v4 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 22195f136440fdadae0c6a0bf04c23fa16b4031c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 26 Apr 2023 15:25:03 +
Subject: [PATCH v4] Fix pg_recvlogical error message upon SIGINT/SIGTERM

When pg_recvlogical gets SIGINT/SIGTERM, it emits
"unexpected termination of replication stream" error, which is
meant for really unexpected termination or a crash, but not for
SIGINT/SIGTERM. Upon SIGINT/SIGTERM, we want pg_recvlogical to
fsync the output file before exiting cleanly. This commit changes
pg_recvlogical to that.

Reported-by: Andres Freund
Author: Bharath Rupireddy
Reviewed-by: Kyotaro Horiguchi, Andres Freund
Reviewed-by: Cary Huang
Discussion: https://www.postgresql.org/message-id/20221019213953.htdtzikf4f45ywil%40awork3.anarazel.de
---
 src/bin/pg_basebackup/pg_recvlogical.c | 29 ++
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index f3c7937a1d..337076647b 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -54,7 +54,8 @@ static const char *plugin = "test_decoding";
 
 /* Global State */
 static int	outfd = -1;
-static volatile sig_atomic_t time_to_abort = false;
+static bool	time_to_abort = false;
+static volatile sig_atomic_t ready_to_exit = false;
 static volatile sig_atomic_t output_reopen = false;
 static bool output_isfile;
 static TimestampTz output_last_fsync = -1;
@@ -283,6 +284,23 @@ StreamLogicalLog(void)
 			copybuf = NULL;
 		}
 
+		/* When we get SIGINT/SIGTERM, we exit */
+		if (ready_to_exit)
+		{
+			/*
+			 * Try informing the server about our exit, but don't wait around
+			 * or retry on failure.
+			 */
+			PQputCopyEnd(conn, NULL);
+			PQflush(conn);
+			time_to_abort = true;
+
+			if (verbose)
+pg_log_info("received interrupt signal, exiting");
+
+			break;
+		}
+
 		/*
 		 * Potentially send a status message to the primary.
 		 */
@@ -614,7 +632,10 @@ StreamLogicalLog(void)
 
 		res = PQgetResult(conn);
 	}
-	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+
+	/* It is not unexepected termination error when Ctrl-C'ed. */
+	if (!ready_to_exit &&
+		PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
 		pg_log_error("unexpected termination of replication stream: %s",
 	 PQresultErrorMessage(res));
@@ -656,7 +677,7 @@ error:
 static void
 sigexit_handler(SIGNAL_ARGS)
 {
-	time_to_abort = true;
+	ready_to_exit = true;
 }
 
 /*
@@ -976,7 +997,7 @@ main(int argc, char **argv)
 	while (true)
 	{
 		StreamLogicalLog();
-		if (time_to_abort)
+		if (ready_to_exit || time_to_abort)
 		{
 			/*
 			 * We've been Ctrl-C'ed or reached an exit limit condition. That's
-- 
2.34.1



Re: proposal: psql: show current user in prompt

2023-04-26 Thread Pavel Stehule
Hi

rebased version + fix warning possibly uninitialized variable

Regards

Pavel
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b11d9a6ba3..f774ffa310 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -5401,6 +5401,43 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
 

 
+   
+ReportGUC (F)
+
+ 
+  
+   Byte1('r')
+   
+
+ Identifies the message type.  ReportGUC is sent by
+ frontend when the changes of specified GUC option
+ should be (or should not be) reported to state parameter.
+
+   
+  
+
+  
+   String
+   
+
+ The name of GUC option.
+
+   
+  
+
+  
+   Int16
+   
+
+ 1 if reporting should be enables, 0 if reporting should be
+ disabled.
+
+   
+  
+ 
+
+   
+

 RowDescription (B)
 
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc422373d6..569f50fe7d 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4551,7 +4551,24 @@ testdb=> INSERT INTO my_table VALUES (:'content');
 The port number at which the database server is listening.
   
 
-  
+  
+%N
+
+ 
+  The database role name. This value is specified by command
+  SET ROLE. Until execution of this command
+  the value is set to the database session user name.
+ 
+
+ 
+  This substitution requires PostgreSQL
+  version 16 and up. When you use older version, the empty string
+  is used instead.
+ 
+
+  
+
+  
 %n
 
  
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01b6cc1f7d..54a3f5c1e9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -451,6 +451,11 @@ SocketBackend(StringInfo inBuf)
 			doing_extended_query_message = false;
 			break;
 
+		case 'r':/* report GUC */
+			maxmsglen = PQ_SMALL_MESSAGE_LIMIT;
+			doing_extended_query_message = false;
+			break;
+
 		default:
 
 			/*
@@ -4823,6 +4828,24 @@ PostgresMain(const char *dbname, const char *username)
 	pq_flush();
 break;
 
+			case 'r':			/* report GUC */
+{
+	const char	   *name;
+	intcreate_flag;
+
+	name = pq_getmsgstring(&input_message);
+	create_flag = pq_getmsgint(&input_message, 2);
+	pq_getmsgend(&input_message);
+
+	if (create_flag)
+		SetGUCOptionFlag(name, GUC_REPORT);
+	else
+	UnsetGUCOptionFlag(name, GUC_REPORT);
+
+	send_ready_for_query = true;
+}
+break;
+
 			case 'S':			/* sync */
 pq_getmsgend(&input_message);
 finish_xact_command();
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9dd624b3ae..e0dccd1fa3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2531,6 +2531,37 @@ BeginReportingGUCOptions(void)
 	}
 }
 
+/*
+ * Allow to set / unset dynamicaly flags to GUC variables
+ */
+void
+SetGUCOptionFlag(const char *name, int flag)
+{
+	struct config_generic *conf;
+
+	/* only GUC_REPORT flag is supported now */
+	Assert(flag == GUC_REPORT);
+
+	conf = find_option(name, false, true, ERROR);
+	conf->flags |= flag;
+
+	if (flag == GUC_REPORT)
+		/* force transmit value of related option to client Parameter Status */
+		ReportGUCOption(conf);
+}
+
+void
+UnsetGUCOptionFlag(const char *name, int flag)
+{
+	struct config_generic *conf;
+
+	/* only GUC_REPORT flag is supported now */
+	Assert(flag == GUC_REPORT);
+
+	conf = find_option(name, false, true, ERROR);
+	conf->flags &= ~flag;
+}
+
 /*
  * ReportChangedGUCOptions: report recently-changed GUC_REPORT variables
  *
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 97f7d97220..5276da54fc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3859,6 +3859,7 @@ SyncVariables(void)
 {
 	char		vbuf[32];
 	const char *server_version;
+	int			res;
 
 	/* get stuff from connection */
 	pset.encoding = PQclientEncoding(pset.db);
@@ -3888,6 +3889,15 @@ SyncVariables(void)
 	/* send stuff to it, too */
 	PQsetErrorVerbosity(pset.db, pset.verbosity);
 	PQsetErrorContextVisibility(pset.db, pset.show_context);
+
+	/* link role GUC when it is needed for prompt */
+	if (pset.prompt_shows_role)
+		res = PQlinkParameterStatus(pset.db, "role");
+	else
+		res = PQunlinkParameterStatus(pset.db, "role");
+
+	if (res < 0)
+		pg_log_info("%s", PQerrorMessage(pset.db));
 }
 
 /*
diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 969cd9908e..b0f5158c59 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -165,6 +165,41 @@ get_prompt(promptStatus_t status, ConditionalStack cstack)
 	if (pset.db)
 		strlcpy(buf, session_username(), sizeof(buf));
 	break;
+	/* DB server user role */
+case 'N':
+			

Re: Add LZ4 compression in pg_dump

2023-04-26 Thread Michael Paquier
On Wed, Apr 26, 2023 at 08:50:46AM +, gkokola...@pm.me wrote:
> For what is worth, I think this would be the best approach. +1

Thanks.  I have gone with that, then!
--
Michael


signature.asc
Description: PGP signature


Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-04-26 Thread John Naylor
On Wed, Apr 26, 2023 at 5:18 PM Jakub Wartak 
wrote:

> OK, so here is the documentation patch proposal. I've also added two
> rows touching the subject of pg_largeobjects, as it is also related to
> the OIDs topic.

-partition keys
-32
-can be increased by recompiling
PostgreSQL
+ partition keys
+ 32
+ can be increased by recompiling
PostgreSQL

Spurious whitespace.

- limited by the number of tuples that can fit onto
4,294,967,295 pages
- 
+ limited by the number of tuples that can fit onto
4,294,967,295 pages or using up to 2^32 OIDs for TOASTed values
+ please see discussion below about OIDs

I would keep the first as is, and change the second for consistency to "see
note below on TOAST".

Also, now that we have more than one note, we should make them more
separate. That's something to discuss, no need to do anything just yet.

The new note needs a lot of editing to fit its new home. For starters:

+ 
+  For every TOAST-ed columns

column

+ (that is for field values wider than TOAST_TUPLE_TARGET
+  [2040 bytes by default]), due to internal PostgreSQL implementation of
using one
+  shared global OID counter - today you cannot have more than

+ 2^32

Perhaps it should match full numbers elsewhere in the page.

+(unsigned integer;

True but irrelevant.

+  4 billion)

Imprecise and redundant.

+ out-of-line values in a single table, because there would have to be
+  duplicated OIDs in its TOAST table.

The part after "because" should be left off.

+  Please note that that the limit of 2^32
+  out-of-line TOAST values applies to the sum of both visible and
invisible tuples.

We didn't feel the need to mention this for normal tuples...

+  It is therefore crucial that the autovacuum manages to keep up with
cleaning the
+  bloat and free the unused OIDs.
+ 

Out of place.

+ 
+  In practice, you want to have considerably less than that many TOASTed
values
+  per table, because as the OID space fills up the system might spend large
+  amounts of time searching for the next free OID when it needs to
generate a new
+  out-of-line value.

s/might spend large/will spend larger/ ?

+ After 100 failed attempts to get a free OID, a first log
+  message is emitted "still searching for an unused OID in relation", but
operation
+  won't stop and will try to continue until it finds the free OID.

Too much detail?

+ Therefore,
+  the OID shortages may (in very extreme cases) cause slowdowns to the
+  INSERTs/UPDATE/COPY statements.

s/may (in very extreme cases)/will eventually/

+ It's also worth emphasizing that

Unnecessary.

+ only field
+  values wider than 2KB

TOAST_TUPLE_TARGET

+ will consume TOAST OIDs in this way. So, in practice,
+  reaching this limit would require many terabytes of data in a single
table,

It may be worth mentioning what Nikita said above about updates.

+  especially if you have a wide range of value widths.

I never understood this part.

+   
+ large objects size
+ subject to the same limitations as single relation
size
+ LOs are stored in single pg_largeobjects relation
+   

Are you under the impression that we can store a single large object up to
table size? The limit is 4TB, as documented elsewhere.

+   
+ large objects number

"large objects per database"

+ subject to the same limitations as rows per
table

That implies table size is the only factor. Max OID is also a factor, which
was your stated reason to include LOs here in the first place.

+ LOs are stored in single pg_largeobjects relation

I would just say "also limited by relation size".

(note: Our catalogs are named in the singular.)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Amit Kapila
On Wed, Apr 26, 2023 at 4:41 PM Drouvot, Bertrand
 wrote:
>
> On 4/26/23 12:27 PM, Alvaro Herrera wrote:
> >> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
> >> b/src/test/perl/PostgreSQL/Test/Cluster.pm
> >> index 6f7f4e5de4..819667d42a 100644
> >> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> >> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> >> @@ -2644,7 +2644,16 @@ sub wait_for_catchup
> >>  }
> >>  if (!defined($target_lsn))
> >>  {
> >> -$target_lsn = $self->lsn('write');
> >> +my $isrecovery = $self->safe_psql('postgres', "SELECT 
> >> pg_is_in_recovery()");
> >> +chomp($isrecovery);
> >> +if ($isrecovery eq 't')
> >> +{
> >> +$target_lsn = $self->lsn('replay');
> >> +}
> >> +else
> >> +{
> >> +$target_lsn = $self->lsn('write');
> >> +}
> >
> > Please modify the function's documentation to account for this code change.
> >
>
> Good point, thanks! Done in V6 attached.
>

+When in recovery, the default value of target_lsn is $node->lsn('replay')
+instead. This is needed when the publisher passed to
wait_for_subscription_sync()
+is a standby node.

I think this will be useful whenever wait_for_catchup has been called
for a standby node (where self is a standby node). I have tried even
by commenting wait_for_subscription_sync in the new test then it fails
for $node_standby->wait_for_catchup('tap_sub');. So instead, how about
a comment like: "When in recovery, the default value of target_lsn is
$node->lsn('replay') instead which ensures that the cascaded standby
has caught up to what has been replayed on the standby."?

-- 
With Regards,
Amit Kapila.




RE: Initial Schema Sync for Logical Replication

2023-04-26 Thread Wei Wang (Fujitsu)
On Fri, Apr 21, 2023 at 16:48 PM Masahiko Sawada  wrote:
> On Thu, Apr 20, 2023 at 8:16 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada
>  wrote:
> > >
> > > On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada
>  wrote:
> > > > >
> > > > >
> > > > > While writing a PoC patch, I found some difficulties in this idea.
> > > > > First, I tried to add schemaname+relname to pg_subscription_rel but I
> > > > > could not define the primary key of pg_subscription_rel. The primary
> > > > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL.
> > > > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname)
> > > > > also doesn't work.
> > > > >
> > > >
> > > > Can we think of having a separate catalog table say
> > > > pg_subscription_remote_rel for this? You can have srsubid,
> > > > remote_schema_name, remote_rel_name, etc. We may need some other
> state
> > > > to be maintained during the initial schema sync where this table can
> > > > be used. Basically, this can be used to maintain the state till the
> > > > initial schema sync is complete because we can create a relation entry
> > > > in pg_subscritption_rel only after the initial schema sync is
> > > > complete.
> > >
> > > It might not be ideal but I guess it works. But I think we need to
> > > modify the name of replication slot for initial sync as it currently
> > > includes OID of the table:
> > >
> > > void
> > > ReplicationSlotNameForTablesync(Oid suboid, Oid relid,
> > > char *syncslotname, Size szslot)
> > > {
> > > snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT,
> suboid,
> > >  relid, GetSystemIdentifier());
> > > }
> > >
> > > If we use both schema name and table name, it's possible that slot
> > > names are duplicated if schema and/or table names are long. Another
> > > idea is to use the hash value of schema+table names, but it cannot
> > > completely eliminate that possibility, and probably would make
> > > investigation and debugging hard in case of any failure. Probably we
> > > can use the OID of each entry in pg_subscription_remote_rel instead,
> > > but I'm not sure it's a good idea, mainly because we will end up using
> > > twice as many OIDs as before.
> > >
> >
> > The other possibility is to use worker_pid. To make debugging easier,
> > we may want to LOG schema_name+rel_name vs slot_name mapping at
> DEBUG1
> > log level.
> 
> Since worker_pid changes after the worker restarts, a new worker
> cannot find the slot that had been used, no?
> 
> After thinking it over, a better solution would be that we add an oid
> column, nspname column, and relname column to pg_subscription_rel and
> the primary key on the oid. If the table is not present on the
> subscriber we store the schema name and table name to the catalog, and
> otherwise we store the local table oid same as today. The local table
> oid will be filled after the schema sync. The names of origin and
> replication slot the tablesync worker uses use the oid instead of the
> table oid.
> 
> I've attached a PoC patch of this idea (very rough patch and has many
> TODO comments). It mixes the following changes:
> 
> 1. Add oid column to the pg_subscription_rel. The oid is used as the
> primary key and in the names of origin and slot the tablesync workers
> use.
> 
> 2. Add copy_schema = on/off option to CREATE SUBSCRIPTION (not yet
> support for ALTER SUBSCRIPTION).
> 
> 3. Add CRS_EXPORT_USE_SNAPSHOT new action in order to use the same
> snapshot by both walsender and other processes (e.g. pg_dump). In this
> patch, the snapshot is exported for pg_dump and is used by the
> walsender for COPY.
> 
> It seems to work well but there might be a pitfall as I've not fully
> implemented it.

Thanks for your POC patch.
After reviewing this patch, I have a question below that want to confirm:

1. In the function synchronize_table_schema.
I think some changes to GUC and table-related object SQLs are included in the
pg_dump result. And in this POC, these SQLs will be executed. Do we need to
alter the pg_dump results to only execute the table schema related SQLs?
For example, if we have below table schema in the publisher-side:
```
create table tbl(a int, b int);
create index idx_t on tbl (a);
CREATE FUNCTION trigger_func() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN 
INSERT INTO public.tbl VALUES (NEW.*); RETURN NEW; END; $$;
CREATE TRIGGER tri_tbl BEFORE INSERT ON public.tbl FOR EACH ROW EXECUTE 
PROCEDURE trigger_func();
```
The result of pg_dump executed on the subscriber-side:
```
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_securit

Re: pg_stat_io not tracking smgrwriteback() is confusing

2023-04-26 Thread Kyotaro Horiguchi
At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman 
 wrote in 
> On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
>  wrote:
> > I've yet to cook up a client backend test case (e.g. with COPY). I've taken
> > that as a todo.
> 
> It was trivial to see client backend writebacks in almost any scenario
> once I set backend_flush_after. I wonder if it is worth mentioning the
> various "*flush_after" gucs in the docs?
> 
> > I have a few outstanding questions:
> >
> > 1) Does it make sense for writebacks to count the number of blocks for
> > which writeback was requested or the number of calls to smgrwriteback() or
> > the number of actual syscalls made? We don't actually know from outside
> > of mdwriteback() how many FileWriteback() calls we will make.
> 
> So, in the attached v3, I've kept the first method: writebacks are the
> number of blocks which the backend has requested writeback of. I'd like
> it to be clear in the docs exactly what writebacks are (so that people
> know not to add them together with writes or something like that). I
> made an effort but could use further docs review.

+Number of units of size op_bytes which the backend
+requested the kernel write out to permanent storage.

I just want to mention that it is not necessarily the same as the
number of system calls, but I'm not sure what others think about that.

+Time spent in writeback operations in milliseconds (if
+ is enabled, otherwise zero). This
+does not necessarily count the time spent by the kernel writing the
+data out. The backend will initiate write-out of the dirty pages and
+wait only if the request queue is full.

The last sentence looks like it's taken from the sync_file_range() man
page, but I think it's a bit too detailed. We could just say, "The
time usually only includes the time it takes to queue write-out
requests.", bit I'm not sure wh...

> > 2) I'm a little nervous about not including IOObject in the writeback
> > context. Technically, there is nothing stopping local buffer code from
> > calling IssuePendingWritebacks(). Right now, local buffer code doesn't
> > do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
> > hardcode in IOOBJECT_RELATION when there is nothing wrong with
> > requesting writeback of local buffers (AFAIK). What do you think?
> 
> I've gone ahead and added IOObject to the WritebackContext.

The smgropen call in IssuePendingWritebacks below clearly shows that
the function only deals with shared buffers.

>   /* and finally tell the kernel to write the data to storage */
>   reln = smgropen(currlocator, InvalidBackendId);
>   smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, 
> nblocks);

The callback-related code fully depends on callers following its
expectation. So we can rewrite the following comment added to
InitBufferPoll with a more confident tone.

+* Initialize per-backend file flush context. IOObject is initialized to
+* IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are 
the
+* most likely targets for writeback. The backend can overwrite these as
+* appropriate.

Or I actually think we might not even need to pass around the io_*
parameters and could just pass immediate values to the
pgstat_count_io_op_time call. If we ever start using shared buffers
for thing other than relation files (for example SLRU?), we'll have to
consider the target individually for each buffer block. That being
said, I'm fine with how it is either.

Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Add PQsendSyncMessage() to libpq

2023-04-26 Thread Anton Kirilov
Hello,

On 25/04/2023 15:23, Denis Laxalde wrote:
> This sounds like a useful addition to me. I've played a bit with it in
> Psycopg and it works fine.

Thank you very much for reviewing my patch! I have attached a new
version of it that addresses your comments and that has been rebased on
top of the current tip of the master branch (by fixing a merge
conflict), i.e. commit 7b7fa85130330128b404eddebd4f33c6739454b0.

For the sake of others who might read this e-mail thread, I would like
to mention that my patch is complete (including documentation and tests,
but modulo review comments, of course), and that it passes the tests, i.e.:

make check
make -C src/test/modules/libpq_pipeline check

Best wishes,
Anton Kirilov
From 7aef7b2cf1ffea0786ab1fb4eca9d85ce7242cf0 Mon Sep 17 00:00:00 2001
From: Anton Kirilov 
Date: Wed, 22 Mar 2023 20:39:57 +
Subject: [PATCH v2] Add PQsendSyncMessage() to libpq

This new function is equivalent to PQpipelineSync(), except
that it does not flush anything to the server; the user must
subsequently call PQflush() instead. Its purpose is to reduce
the system call overhead of pipeline mode.
---
 doc/src/sgml/libpq.sgml   | 45 ---
 src/interfaces/libpq/exports.txt  |  1 +
 src/interfaces/libpq/fe-exec.c| 24 +-
 src/interfaces/libpq/libpq-fe.h   |  1 +
 .../modules/libpq_pipeline/libpq_pipeline.c   | 37 +++
 .../traces/multi_pipelines.trace  | 11 +
 6 files changed, 111 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index cce25d06e6..c2ee982135 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3490,8 +3490,9 @@ ExecStatusType PQresultStatus(const PGresult *res);
   

 The PGresult represents a
-synchronization point in pipeline mode, requested by
-.
+synchronization point in pipeline mode, requested by either
+ or
+.
 This status occurs only when pipeline mode has been selected.

   
@@ -5019,7 +5020,8 @@ int PQsendDescribePortal(PGconn *conn, const char *portalName);
,
,
,
-   , or
+   ,
+   , or

call, and returns it.
A null pointer is returned when the command is complete and there
@@ -5399,8 +5401,9 @@ int PQflush(PGconn *conn);
  client sends them.  The server will begin executing the commands in the
  pipeline immediately, not waiting for the end of the pipeline.
  Note that results are buffered on the server side; the server flushes
- that buffer when a synchronization point is established with
- PQpipelineSync, or when
+ that buffer when a synchronization point is established with either
+ PQpipelineSync or
+ PQsendSyncMessage, or when
  PQsendFlushRequest is called.
  If any statement encounters an error, the server aborts the current
  transaction and does not execute any subsequent command in the queue
@@ -5457,7 +5460,8 @@ int PQflush(PGconn *conn);
  PGresult types PGRES_PIPELINE_SYNC
  and PGRES_PIPELINE_ABORTED.
  PGRES_PIPELINE_SYNC is reported exactly once for each
- PQpipelineSync at the corresponding point
+ PQpipelineSync or
+ PQsendSyncMessage at the corresponding point
  in the pipeline.
  PGRES_PIPELINE_ABORTED is emitted in place of a normal
  query result for the first error and all subsequent results
@@ -5495,7 +5499,8 @@ int PQflush(PGconn *conn);
  PQresultStatus will report a
  PGRES_PIPELINE_ABORTED result for each remaining queued
  operation in an aborted pipeline. The result for
- PQpipelineSync is reported as
+ PQpipelineSync or
+ PQsendSyncMessage is reported as
  PGRES_PIPELINE_SYNC to signal the end of the aborted pipeline
  and resumption of normal result processing.
 
@@ -5727,6 +5732,32 @@ int PQsendFlushRequest(PGconn *conn);

   
  
+
+
+ PQsendSyncMessagePQsendSyncMessage
+
+ 
+  
+   Marks a synchronization point in a pipeline by sending a
+   sync message
+   without flushing the send buffer. This serves as
+   the delimiter of an implicit transaction and an error recovery
+   point; see .
+
+
+int PQsendSyncMessage(PGconn *conn);
+
+  
+  
+   Returns 1 for success. Returns 0 if the connection is not in
+   pipeline mode or sending a
+   sync message
+   failed.
+   Note that the message is not itself flushed to the server automatically;
+   use PQflush if necessary.
+  
+ 
+

   
 
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 7ded77aff3..3c364d692a 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -187,3 +187,4 @@ PQsetTraceFlags   184
 PQmblenBounded185
 PQsendFlushR

Re: pg_stat_io not tracking smgrwriteback() is confusing

2023-04-26 Thread Melanie Plageman
On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
 wrote:
> I've yet to cook up a client backend test case (e.g. with COPY). I've taken
> that as a todo.

It was trivial to see client backend writebacks in almost any scenario
once I set backend_flush_after. I wonder if it is worth mentioning the
various "*flush_after" gucs in the docs?

> I have a few outstanding questions:
>
> 1) Does it make sense for writebacks to count the number of blocks for
> which writeback was requested or the number of calls to smgrwriteback() or
> the number of actual syscalls made? We don't actually know from outside
> of mdwriteback() how many FileWriteback() calls we will make.

So, in the attached v3, I've kept the first method: writebacks are the
number of blocks which the backend has requested writeback of. I'd like
it to be clear in the docs exactly what writebacks are (so that people
know not to add them together with writes or something like that). I
made an effort but could use further docs review.

> 2) I'm a little nervous about not including IOObject in the writeback
> context. Technically, there is nothing stopping local buffer code from
> calling IssuePendingWritebacks(). Right now, local buffer code doesn't
> do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
> hardcode in IOOBJECT_RELATION when there is nothing wrong with
> requesting writeback of local buffers (AFAIK). What do you think?

I've gone ahead and added IOObject to the WritebackContext.

- Melanie
From 8db3a8319553777024379add76527b8a6908028b Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 24 Apr 2023 18:21:54 -0400
Subject: [PATCH v3] Add writeback to pg_stat_io

28e626bde00 added the notion of IOOps but neglected to include
writeback. With the addition of IO timing to pg_stat_io in ac8d53dae5,
the omission of writeback caused some confusion. Checkpointer write
timing in pg_stat_io often differed greatly from the write timing
written to the log. To fix this, add IOOp IOOP_WRITEBACK and track
writebacks and writeback timing in pg_stat_io.

Discussion: https://postgr.es/m/20230419172326.dhgyo4wrrhulovt6%40awork3.anarazel.de
---
 doc/src/sgml/monitoring.sgml  | 27 +++
 src/backend/catalog/system_views.sql  |  2 ++
 src/backend/postmaster/bgwriter.c | 10 --
 src/backend/storage/buffer/buf_init.c | 11 ---
 src/backend/storage/buffer/bufmgr.c   | 20 ++--
 src/backend/utils/adt/pgstatfuncs.c   |  5 +
 src/include/catalog/pg_proc.dat   |  6 +++---
 src/include/pgstat.h  |  3 ++-
 src/include/storage/buf_internals.h   |  6 +-
 src/test/regress/expected/rules.out   |  4 +++-
 src/test/regress/expected/stats.out   |  4 ++--
 src/test/regress/sql/stats.sql|  4 ++--
 12 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 99f7f95c39..d6062b3d89 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3867,6 +3867,33 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   
+writebacks bigint
+   
+   
+Number of units of size op_bytes which the backend
+requested the kernel write out to permanent storage.
+   
+  
+ 
+
+ 
+  
+   
+writeback_time double precision
+   
+   
+Time spent in writeback operations in milliseconds (if
+ is enabled, otherwise zero). This
+does not necessarily count the time spent by the kernel writing the
+data out. The backend will initiate write-out of the dirty pages and
+wait only if the request queue is full.
+   
+  
+ 
+
  
   

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 48aacf66ee..d0c932ad0e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1131,6 +1131,8 @@ SELECT
b.read_time,
b.writes,
b.write_time,
+   b.writebacks,
+   b.writeback_time,
b.extends,
b.extend_time,
b.op_bytes,
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..dbf866f262 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -129,7 +129,12 @@ BackgroundWriterMain(void)
 			 ALLOCSET_DEFAULT_SIZES);
 	MemoryContextSwitchTo(bgwriter_context);
 
-	WritebackContextInit(&wb_context, &bgwriter_flush_after);
+	/*
+	 * bgwriter will only request writeback for permanent relations in
+	 * IOCONTEXT_NORMAL.
+	 */
+	WritebackContextInit(&wb_context, IOOBJECT_RELATION, IOCONTEXT_NORMAL,
+		 &bgwriter_flush_after);
 
 	/*
 	 * If an exception is encountered, processing resumes here.
@@ -185,7 +190,8 @@ BackgroundWriterMain(void)
 		MemoryContextResetAndDeleteChildren(bgwriter_context);
 
 		/*

Re: run pgindent on a regular basis / scripted manner

2023-04-26 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-04-26 We 09:27, Tom Lane wrote:
>> Yeah, I agree, there is no case where that doesn't suck.  I don't
>> mind it imposing specific placements of brackets and so on ---
>> that's very analogous to what pgindent will do.  But it likes to
>> re-flow comma-separated lists, and generally manages to make a
>> complete logical hash of them when it does, as in your other
>> example:

> I doubt there's something like that.

I had a read-through of the latest version's man page, and found
this promising-looking entry:

-boc, --break-at-old-comma-breakpoints

The -boc flag is another way to prevent comma-separated lists from
being reformatted. Using -boc on the above example, plus additional
flags to retain the original style, yields

# perltidy -boc -lp -pt=2 -vt=1 -vtc=1
my @list = (1,
1, 1,
1, 2, 1,
1, 3, 3, 1,
1, 4, 6, 4, 1,);

A disadvantage of this flag compared to the methods discussed above is
that all tables in the file must already be nicely formatted.

I've not tested this, but it looks like it would do what we need,
modulo needing to fix all the existing damage by hand ...

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-04-26 Thread Andrew Dunstan


On 2023-04-26 We 09:27, Tom Lane wrote:

Peter Eisentraut  writes:

On 24.04.23 16:14, Tom Lane wrote:

I certainly don't like its current behavior where adding/changing one
line can have side-effects on nearby lines.  But we have a proposal
to clean that up, and I'm cautiously optimistic that it'll be better
in future.  Did you have other specific concerns?

I think the worst is how it handles multi-line data structures like
  $newnode->command_ok(
  [
  'psql', '-X',
  '-v',   'ON_ERROR_STOP=1',
  '-c',   $upcmds,
  '-d',   $oldnode->connstr($updb),
  ],
  "ran version adaptation commands for database $updb");

Yeah, I agree, there is no case where that doesn't suck.  I don't
mind it imposing specific placements of brackets and so on ---
that's very analogous to what pgindent will do.  But it likes to
re-flow comma-separated lists, and generally manages to make a
complete logical hash of them when it does, as in your other
example:


  $node->command_fails_like(
  [
  'pg_basebackup',   '-D',
  "$tempdir/backup", '--compress',
  $cft->[0]
  ],
  qr/$cfail/,
  'client ' . $cft->[2]);

Can we fix it to preserve the programmer's choices of line breaks
in comma-separated lists?




I doubt there's something like that. You can freeze arbitrary blocks of 
code like this (from the manual)



#<<<  format skipping: do not let perltidy change my nice formatting
    my @list = (1,
    1, 1,
    1, 2, 1,
    1, 3, 3, 1,
    1, 4, 6, 4, 1,);
#>>>


But that gets old and ugly pretty quickly.

There is a --freeze-newlines option, but it's global. I don't think we 
want that.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: issue with meson builds on msys2

2023-04-26 Thread Andrew Dunstan


On 2023-04-26 We 11:30, Tom Lane wrote:

Andrew Dunstan  writes:

If I redirect the output to a file (which is what the buildfarm client
actually does), it seems like it completes successfully, but I still get
a non-zero exit:
pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41";
system("bin/pg_ctl -D data-C -l logfile stop > stoplog 2>&1") ; print
"BANG\n" if $?; '
BANG
pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ cat root/HEAD/instkeep.2023-04-25_11-09-41/stoplog
waiting for server to shut down done
server stopped

Thats ... just wacko.  do_stop() emits "waiting for server to shut
down...", "done", and "server stopped" in the same way (via print_msg).
How is it that all three messages show up in one context but not the
other?  Could wait_for_postmaster_stop or get_pgpid be bollixing the
stdout channel somehow?  Try redirecting stdout and stderr separately
to see if that proves anything.



Doesn't seem to prove much:


pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41"; 
system("bin/pg_ctl -D data-C -l logfile stop > stoplog.out 2> 
stoplog.err") ; print "BANG\n" if $?; '

BANG

pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ cat root/HEAD/instkeep.2023-04-25_11-09-41/stoplog.out
waiting for server to shut down done
server stopped

pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ cat root/HEAD/instkeep.2023-04-25_11-09-41/stoplog.err

pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf





It seems more than odd that we get to where the "server stopped" massage
is printed but we get a failure.

Indeed, that's even weirder.  do_stop() returns directly to the
exit(0) in main().





And if I call it via IPC::Run it works:


pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41"; use 
IPC::Run; my ($out, $err) = ("",""); IPC::Run::run ["bin/pg_ctl", "-D", 
"data-C", "stop"], \"",\$out,\$err ; print "BANG\n" if $?; print "out: 
$out\n" if $out; print "err: $err\n" if $err;'

out: waiting for server to shut down done
server stopped


It seems there is something odd in how msys perl (not ucrt perl) 
implements system() that is tickled by this, but why that should only 
occur when it's built using meson is completely beyond me. It should be 
just another executable. And pg_ctl is behaving properly as far as we 
can see. I'm not quite sure where to go from here. I guess I can try to 
see if we have IPC::Run and if so use it. That would probably get me 
over the hurdle for fairywren. This has already consumed far too much of 
my time.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: pg_stat_io for the startup process

2023-04-26 Thread Melih Mutlu
Hi all,

Robert Haas , 26 Nis 2023 Çar, 15:34 tarihinde şunu
yazdı:

> On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
>  wrote:
> > 3. When should we call pgstat_report_stats on the startup process?
> >
> > During recovery, I think we can call pgstat_report_stats() (or a
> > subset of it) right before invoking WaitLatch and at segment
> > boundaries.
>
> I think this kind of idea is worth exploring. Andres mentioned timers,
> but it seems to me that something where we just do it at certain
> "convenient" points might be good enough and simpler. I'd much rather
> have statistics that were up to date as of the last time we finished a
> segment than have nothing at all.
>

I created a rough prototype of a timer-based approach for comparison.
Please see attached.

Registered a new timeout named "STARTUP_STAT_FLUSH_TIMEOUT", The timeout's
handler sets a flag indicating that io stats need to be flushed.
HandleStartupProcInterrupts checks the flag to flush stats.
It's enabled if any WAL record is read (in the main loop of
PerformWalRecovery). And it's disabled before WaitLatch to avoid
unnecessary wakeups if the startup process decides to sleep.

With those changes, I see that startup related rows in pg_stat_io are
updated without waiting for the startup process to exit.

postgres=# select context, reads, extends, hits from pg_stat_io where
> backend_type = 'startup';
>   context  | reads  | extends |   hits
> ---++-+--
>  bulkread  |  0 | |0
>  bulkwrite |  0 |   0 |0
>  normal|  6 |   1 |   41
>  vacuum|  0 |   0 |0
> (4 rows)


I'm not sure this is the correct way to implement this approach though. I
appreciate any comment.

Also; some questions about this implementation if you think it's worth
discussing:
1- I set an arbitrary timeout (1 sec) for testing. I don't know what the
correct value should be. Does it make sense to use
PGSTAT_MIN/MAX/IDLE_INTERVAL instead?
2- I'm also not sure if this timeout should be registered at the beginning
of StartupProcessMain, or does it even make any difference. I tried to do
this just before the main redo loop in PerformWalRecovery, but that made
the CI red.

Best,
-- 
Melih Mutlu
Microsoft


0001-Add-timeout-to-flush-stats-during-startup-s-main-replay-loop.patch
Description: Binary data


Re: Autogenerate some wait events code and documentation

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/26/23 6:51 PM, Drouvot, Bertrand wrote:

Hi,

On 4/25/23 7:15 AM, Michael Paquier wrote:

Will do, no problem at all.



Please find attached V5 addressing the previous comments except
the "ordering" one (need to look deeper at this).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom b9a270c72ede800f2819b326aef7a7144027d861 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Sat, 22 Apr 2023 10:37:56 +
Subject: [PATCH v5] Generating wait_event_types.h, pgstat_wait_event.c and
 waiteventnames.sgml

Purpose is to auto-generate those files based on the newly created 
waiteventnames.txt file.

Having auto generated files would help to avoid:

- wait event without description like observed in 
https://www.postgresql.org/message-id/CA%2BhUKGJixAHc860Ej9Qzd_z96Z6aoajAgJ18bYfV3Lfn6t9%3D%2BQ%40mail.gmail.com
- orphaned wait event like observed in 
https://www.postgresql.org/message-id/CA%2BhUKGK6tqm59KuF1z%2Bh5Y8fsWcu5v8%2B84kduSHwRzwjB2aa_A%40mail.gmail.com
---
 doc/src/sgml/.gitignore   |   1 +
 doc/src/sgml/Makefile |   5 +-
 doc/src/sgml/filelist.sgml|   1 +
 doc/src/sgml/meson.build  |  10 +
 doc/src/sgml/monitoring.sgml  | 792 +-
 src/backend/Makefile  |  13 +-
 src/backend/utils/activity/Makefile   |  10 +
 .../utils/activity/generate-waiteventnames.pl | 204 +
 src/backend/utils/activity/meson.build|  24 +
 src/backend/utils/activity/wait_event.c   | 567 +
 src/backend/utils/activity/waiteventnames.txt | 245 ++
 src/include/Makefile  |   2 +-
 src/include/utils/.gitignore  |   1 +
 src/include/utils/meson.build |  18 +
 src/include/utils/wait_event.h| 214 +
 src/tools/msvc/Solution.pm|  18 +
 src/tools/msvc/clean.bat  |   3 +
 17 files changed, 556 insertions(+), 1572 deletions(-)
  36.2% doc/src/sgml/
  52.9% src/backend/utils/activity/
   8.6% src/include/utils/

diff --git a/doc/src/sgml/.gitignore b/doc/src/sgml/.gitignore
index d8e3dab338..e8cbd687d5 100644
--- a/doc/src/sgml/.gitignore
+++ b/doc/src/sgml/.gitignore
@@ -17,6 +17,7 @@
 /errcodes-table.sgml
 /keywords-table.sgml
 /version.sgml
+/waiteventnames.sgml
 # Assorted byproducts from building the above
 /postgres-full.xml
 /INSTALL.html
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 71cbef230f..01a6aa8187 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -58,7 +58,7 @@ override XSLTPROCFLAGS += --stringparam pg.version 
'$(VERSION)'
 
 GENERATED_SGML = version.sgml \
features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
-   keywords-table.sgml
+   keywords-table.sgml waiteventnames.sgml
 
 ALLSGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
 
@@ -111,6 +111,9 @@ errcodes-table.sgml: 
$(top_srcdir)/src/backend/utils/errcodes.txt generate-errco
 keywords-table.sgml: $(top_srcdir)/src/include/parser/kwlist.h $(wildcard 
$(srcdir)/keywords/sql*.txt) generate-keywords-table.pl
$(PERL) $(srcdir)/generate-keywords-table.pl $(srcdir) > $@
 
+waiteventnames.sgml: 
$(top_srcdir)/src/backend/utils/activity/waiteventnames.txt 
$(top_srcdir)/src/backend/utils/activity/generate-waiteventnames.pl
+   $(PERL) 
$(top_srcdir)/src/backend/utils/activity/generate-waiteventnames.pl $< > $@
+   rm pgstat_wait_event.c; rm wait_event_types.h
 
 ##
 ## Generation of some text files.
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 0d6be9a2fa..9ab235d36a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -42,6 +42,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index c6d77b5a15..53e4505b97 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -46,6 +46,16 @@ doc_generated += custom_target('errcodes-table.sgml',
   capture: true,
 )
 
+doc_generated += custom_target('waiteventnames.sgml',
+  input: files(
+'../../../src/backend/utils/activity/waiteventnames.txt'),
+  output: 'waiteventnames.sgml',
+  command: [perl, 
files('../../../src/backend/utils/activity/generate-waiteventnames.pl'), 
'@INPUT@'],
+  build_by_default: false,
+  install: false,
+  capture: true,
+)
+
 # FIXME: this actually has further inputs, adding depfile support to
 # generate-keywords-table.pl is probably the best way to address that
 # robustly.
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 99f7f95c39..c8648fca18 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1100,74 +1100,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser

   
 
-  
-   Wait Events of Type Activity
-   
-
- 
- 

Re: Autogenerate some wait events code and documentation

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/25/23 7:15 AM, Michael Paquier wrote:

On Mon, Apr 24, 2023 at 09:03:53AM +0200, Drouvot, Bertrand wrote:

Oh right, fixed.


I may tweak a few things if I put my hands on it, but that looks
pretty solid seen from here.. 


Glad to hear! ;-)


I have spotted a few extra issues.

One thing I have noticed with v4 is that the order of the tables
generated in wait_event_types.h and the SGML docs is inconsistent with
previous versions, and these are not in an alphabetical order.  HEAD
orders them as Activity, BufferPin, Client, Extension, IO, IPC, Lock,
LWLock and Timeout.  This patch switches the order to become
different, and note that the first table describing each of the wait
event type classes gets it right.



Right, ordering being somehow broken is also something I did mention initially 
when I first
presented this patch up-thread. That's also due to the fact that this patch
does not autogenerate PG_WAIT_LWLOCK, PG_WAIT_LOCK, PG_WAIT_BUFFER_PIN and 
PG_WAIT_EXTENSION.


It seems to me that you should apply an extra ordering in
generate-waiteventnames.pl to make sure that the tables are printed in
the same order as previously, around here:
+# Generate the output files
+foreach $waitclass (keys %hashwe) {



Yeah but that would still affect only the auto-generated one and then
result to having the wait event part of the documentation "monitoring-stats"
not ordered as compared to the "Wait Event Types" Table.

And as we have only one "include" in doc/src/sgml/monitoring.sgml for all the
auto-generated one, the ordering would still be broken.


(The table describing all the wait event types could be removed from
the SGML docs as well, at the cost of having their description in the
new .txt file.  However, as these are long, it would make the .txt
file much messier, so not doing this extra part is OK for me.)


Right, but that might be a valuable option to also fix the ordering issue
mentioned above (need to look deeper at this).



- * Use this category when a process is waiting because it has no work to do,
- * unless the "Client" or "Timeout" category describes the situation better.
- * Typically, this should only be used for background processes

wait_event.h includes a set of comments describing each category, that
this patch removes.  Rather than removing this information, which is
helpful to have around, why not making them comments of
waiteventnames.txt instead?  Losing this information would be sad.



Yeah, good point, I'll look at this.


+#   src/backend/utils/activity/pgstat_wait_event.c
+#  c functions to get the wait event name based on the enum
Nit.  'c' should be upper-case.

 }
+
 if (IsNewer(
 'src/include/storage/lwlocknames.h',
Not wrong, but this is an unrelated diff.



Yeah, probably due to a pgindent run.


+if %DIST%==1 if exist src\backend\utils\activity\pgstat_wait_event.c del /q 
src\backend\utils\activity\pgstat_wait_event.c
  if %DIST%==1 if exist src\backend\storage\lmgr\lwlocknames.h del /q 
src\backend\storage\lmgr\lwlocknames.h
+if %DIST%==1 if exist src\backend\utils\activity\wait_event_types.h del /q 
src\backend\utils\activity\wait_event_types.h
The order here is off a bit.  Missed that previously..

perltidy had better be run on generate-waiteventnames.pl (I can do
that myself, no worries), as a couple of lines' format don't seem
quite in line.


Will do, no problem at all.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: issue with meson builds on msys2

2023-04-26 Thread Tom Lane
Andrew Dunstan  writes:
> If I redirect the output to a file (which is what the buildfarm client 
> actually does), it seems like it completes successfully, but I still get 
> a non-zero exit:

> pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
> $ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41"; 
> system("bin/pg_ctl -D data-C -l logfile stop > stoplog 2>&1") ; print 
> "BANG\n" if $?; '
> BANG

> pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
> $ cat root/HEAD/instkeep.2023-04-25_11-09-41/stoplog
> waiting for server to shut down done
> server stopped

Thats ... just wacko.  do_stop() emits "waiting for server to shut
down...", "done", and "server stopped" in the same way (via print_msg).
How is it that all three messages show up in one context but not the
other?  Could wait_for_postmaster_stop or get_pgpid be bollixing the
stdout channel somehow?  Try redirecting stdout and stderr separately
to see if that proves anything.

> It seems more than odd that we get to where the "server stopped" massage 
> is printed but we get a failure.

Indeed, that's even weirder.  do_stop() returns directly to the
exit(0) in main().

regards, tom lane




Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2023-04-26 Thread Dimos Stamatakis
Hi hackers,

I was wondering whether there are any updates on the bug in visibility check 
introduced in version 14.5.

Many thanks,
Dimos
[ServiceNow]


Re: issue with meson builds on msys2

2023-04-26 Thread Andrew Dunstan


On 2023-04-26 We 10:58, Tom Lane wrote:

I wrote:

Looking at the pg_ctl source code, the only way I can explain that
printout is that do_stop called wait_for_postmaster_stop which,
after one or more loops, exited via one of its exit() calls.

Ah, a little too hasty there: it's get_pgpid() that has to be
reaching an exit().




If I redirect the output to a file (which is what the buildfarm client 
actually does), it seems like it completes successfully, but I still get 
a non-zero exit:



pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41"; 
system("bin/pg_ctl -D data-C -l logfile stop > stoplog 2>&1") ; print 
"BANG\n" if $?; '

BANG

pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
$ cat root/HEAD/instkeep.2023-04-25_11-09-41/stoplog
waiting for server to shut down done
server stopped


It seems more than odd that we get to where the "server stopped" massage 
is printed but we get a failure.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: issue with meson builds on msys2

2023-04-26 Thread Tom Lane
I wrote:
> Looking at the pg_ctl source code, the only way I can explain that
> printout is that do_stop called wait_for_postmaster_stop which,
> after one or more loops, exited via one of its exit() calls.

Ah, a little too hasty there: it's get_pgpid() that has to be
reaching an exit().

regards, tom lane




Re: issue with meson builds on msys2

2023-04-26 Thread Tom Lane
Andrew Dunstan  writes:
>> For some reason which makes no sense to me the buildfarm animal fails 
>> at the first Stop-Db step. The DB is actually stopped, but pg_ctl 
>> returns a non-zero status. The thing that's really odd is that meson 
>> isn't at all involved in this step. But it's happened enough that I've 
>> had to back off using meson builds on fairywren - I'm going to do more 
>> testing on a new Windows instance.

> Here's a simple illustration of the problem. If I do the identical test 
> with a non-meson build there is no problem:

> pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf
> $ /usr/bin/perl -e 'chdir "root/HEAD/instkeep.2023-04-25_11-09-41"; 
> system("bin/pg_ctl -D data-C -l logfile stop") ; print "fail\n" if $?; '
> waiting for server to shut downfail

Looking at the pg_ctl source code, the only way I can explain that
printout is that do_stop called wait_for_postmaster_stop which,
after one or more loops, exited via one of its exit() calls.
The lack of any message can be explained if we imagine that
write_stderr() output is going to the bit bucket.  I'd start by
changing those write_stderr's to print_msg(), which visibly
does work; that should confirm the existence of the stderr
issue and show you how wait_for_postmaster_stop is failing.

regards, tom lane




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/26/23 11:58 AM, Yu Shi (Fujitsu) wrote:

On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand  
wrote:



I think that's because when replaying a checkpoint record, the startup process
of standby only saves the information of the checkpoint, and we need to wait for
the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If
so, could we force a checkpoint on standby? After this, the standby should have
completed the restartpoint and we don't need to wait.



Thanks for looking at it!

Oh right, that looks like good a good way to ensure the WAL file is removed on 
the standby
so that we don't need to wait.

Implemented that way in V6 attached and that works fine.


Besides, would it be better to wait for the cascading standby? If the wal log
file needed for cascading standby is removed on the standby, the subsequent test
will fail. 


Good catch! I agree that we have to wait on the cascading standby before 
removing
the WAL files. It's done in V6 (and the test is not failing anymore if we set a
recovery_min_apply_delay to 5s on the cascading standby).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 79f554eaf8185a2d34dc48ba31f1a3b3cd09c185 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 25 Apr 2023 06:02:17 +
Subject: [PATCH v6] Add retained WAL test in 035_standby_logical_decoding.pl

Adding one test, to verify that invalidated logical slots do not lead to
retaining WAL.
---
 .../t/035_standby_logical_decoding.pl | 69 ++-
 1 file changed, 67 insertions(+), 2 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index f6d6447412..ea9ca46995 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -495,9 +495,74 @@ $node_standby->restart;
 check_slots_conflicting_status(1);
 
 ##
-# Verify that invalidated logical slots do not lead to retaining WAL
+# Verify that invalidated logical slots do not lead to retaining WAL.
 ##
-# X TODO
+
+# Before removing WAL files, ensure the cascading standby catch up
+$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary);
+
+# Get the restart_lsn from an invalidated slot
+my $restart_lsn = $node_standby->safe_psql('postgres',
+   "SELECT restart_lsn from pg_replication_slots WHERE slot_name = 
'vacuum_full_activeslot' and conflicting is true;"
+);
+
+chomp($restart_lsn);
+
+# Get the WAL file name associated to this lsn on the primary
+my $walfile_name = $node_primary->safe_psql('postgres',
+   "SELECT pg_walfile_name('$restart_lsn')");
+
+chomp($walfile_name);
+
+# Check the WAL file is still on the primary
+ok(-f $node_primary->data_dir . '/pg_wal/' . $walfile_name,
+   "WAL file still on the primary");
+
+# Get the number of WAL files on the standby
+my $nb_standby_files = $node_standby->safe_psql('postgres',
+   "SELECT COUNT(*) FROM pg_ls_dir('pg_wal')");
+
+chomp($nb_standby_files);
+
+# Switch WAL files on the primary
+my @c = (1 .. $nb_standby_files);
+
+$node_primary->safe_psql('postgres', "create table retain_test(a int)");
+
+for (@c)
+{
+   $node_primary->safe_psql(
+   'postgres', "SELECT pg_switch_wal();
+  insert into retain_test values("
+ . $_ . ");");
+}
+
+# Ask for a checkpoint
+$node_primary->safe_psql('postgres', 'checkpoint;');
+
+# Check that the WAL file has not been retained on the primary
+ok(!-f $node_primary->data_dir . '/pg_wal/' . $walfile_name,
+   "WAL file not on the primary anymore");
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Generate another WAL switch, more activity and a checkpoint
+$node_primary->safe_psql(
+   'postgres', "SELECT pg_switch_wal();
+ insert into retain_test values(1);");
+$node_primary->safe_psql('postgres', 'checkpoint;');
+
+# Wait for the standby to catch up
+$node_primary->wait_for_catchup($node_standby);
+
+# Request a checkpoint on the standby to trigger the WAL file(s) removal
+$node_standby->safe_psql('postgres', 'checkpoint;');
+
+# Verify that the wal file has not been retained on the standby
+my $standby_walfile = $node_standby->data_dir . '/pg_wal/' . $walfile_name;
+ok( !-f "$standby_walfile",
+"invalidated logical slots do not lead to retaining WAL");
 
 ##
 # Recovery conflict: Invalidate conflicting slots, including in-use slots
-- 
2.34.1



Re: vector search support

2023-04-26 Thread Giuseppe Broccolo
Hi Nathan,

I find the patches really interesting. Personally, as Data/MLOps Engineer,
I'm involved in a project where we use embedding techniques to generate
vectors from documents, and use clustering and kNN searches to find similar
documents basing on spatial neighbourhood of generated vectors.

We finally opted for ElasticSearch as search engine, considering that it
was providing what we needed:

* support to store dense vectors
* support for kNN searches (last version of ElasticSearch allows this)

An internal benchmark showed us that we were able to achieve the expected
performance, although we are still lacking some points:

* clustering of vectors (this has to be done outside the search engine,
using DBScan for our use case)
* concurrency in updating the ElasticSearch indexes storing the dense
vectors

I found these patches really interesting, considering that they would solve
some of open issues when storing dense vectors. Index support would help a
lot with searches though.

Not sure if it's the best to include in PostgreSQL core, but would be
fantastic to have it as an extension.

All the best,
Giuseppe.

On Sat, 22 Apr 2023, 01:07 Nathan Bossart,  wrote:

> Attached is a proof-of-concept/work-in-progress patch set that adds
> functions for "vectors" repreѕented with one-dimensional float8 arrays.
> These functions may be used in a variety of applications, but I am
> proposing them with the AI/ML use-cases in mind.  I am posting this early
> in the v17 cycle in hopes of gathering feedback prior to PGCon.
>
> With the accessibility of AI/ML tools such as large language models (LLMs),
> there has been a demand for storing and manipulating high-dimensional
> vectors in PostgreSQL, particularly around nearest-neighbor queries.  Many
> of these vectors have more than 1500 dimensions.  The cube extension [0]
> provides some of the distance functionality (e.g., taxicab, Euclidean, and
> Chebyshev), but it is missing some popular functions (e.g., cosine
> similarity, dot product), and it is limited to 100 dimensions.  We could
> extend cube to support more dimensions, but this would require reworking
> its indexing code and filling in gaps between the cube data type and the
> array types.  For some previous discussion about using the cube extension
> for this kind of data, see [1].
>
> float8[] is well-supported and allows for effectively unlimited dimensions
> of data.  float8 matches the common output format of many AI embeddings,
> and it allows us or extensions to implement indexing methods around these
> functions.  This patch set does not yet contain indexing support, but we
> are exploring using GiST or GIN for the use-cases in question.  It might
> also be desirable to add support for other linear algebra operations (e.g.,
> operations on matrices).  The attached patches likely only scratch the
> surface of the "vector search" use-case.
>
> The patch set is broken up as follows:
>
>  * 0001 does some minor refactoring of dsqrt() in preparation for 0002.
>  * 0002 adds several vector-related functions, including distance functions
>and a kmeans++ implementation.
>  * 0003 adds support for optionally using the OpenBLAS library, which is an
>implementation of the Basic Linear Algebra Subprograms [2]
>specification.  Basic testing with this library showed a small
>performance boost, although perhaps not enough to justify giving this
>patch serious consideration.
>
> Of course, there are many open questions.  For example, should PostgreSQL
> support this stuff out-of-the-box in the first place?  And should we
> introduce a vector data type or SQL domains for treating float8[] as
> vectors?  IMHO these vector search use-cases are an exciting opportunity
> for the PostgreSQL project, so I am eager to hear what folks think.
>
> [0] https://www.postgresql.org/docs/current/cube.html
> [1] https://postgr.es/m/2271927.1593097400%40sss.pgh.pa.us
> [2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


Re: run pgindent on a regular basis / scripted manner

2023-04-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 24.04.23 16:14, Tom Lane wrote:
>> I certainly don't like its current behavior where adding/changing one
>> line can have side-effects on nearby lines.  But we have a proposal
>> to clean that up, and I'm cautiously optimistic that it'll be better
>> in future.  Did you have other specific concerns?

> I think the worst is how it handles multi-line data structures like

>  $newnode->command_ok(
>  [
>  'psql', '-X',
>  '-v',   'ON_ERROR_STOP=1',
>  '-c',   $upcmds,
>  '-d',   $oldnode->connstr($updb),
>  ],
>  "ran version adaptation commands for database $updb");

Yeah, I agree, there is no case where that doesn't suck.  I don't
mind it imposing specific placements of brackets and so on ---
that's very analogous to what pgindent will do.  But it likes to
re-flow comma-separated lists, and generally manages to make a
complete logical hash of them when it does, as in your other
example:

>  $node->command_fails_like(
>  [
>  'pg_basebackup',   '-D',
>  "$tempdir/backup", '--compress',
>  $cft->[0]
>  ],
>  qr/$cfail/,
>  'client ' . $cft->[2]);

Can we fix it to preserve the programmer's choices of line breaks
in comma-separated lists?

regards, tom lane




Re: [PATCH] Compression dictionaries for JSONB

2023-04-26 Thread Aleksander Alekseev
Hi Nikita,

> The External TOAST pointer is very limited to the amount of service data
> it could keep, that's why we introduced the Custom TOAST pointers in the
> Pluggable TOAST. But keep in mind that changing the TOAST pointer
> structure requires a lot of quite heavy modifications in the core - along with
> some obvious places like insert/update/delete datum there is very serious
> issue with logical replication.
> The Pluggable TOAST was rejected, but we have a lot of improvements
> based on changing the TOAST pointer structure.

Now I see what you meant [1]. I agree that we should focus on
refactoring TOAST pointers first. So I suggest we continue discussing
this in a corresponding thread and return to this one later.

[1]: 
https://www.postgresql.org/message-id/CAJ7c6TPSvR2rKpoVX5TSXo_kMxXF%2B-SxLtrpPaMf907tX%3DnVCw%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-26 Thread Melanie Plageman
On Wed, Apr 26, 2023 at 8:31 AM Daniel Gustafsson  wrote:

> > On 26 Apr 2023, at 13:26, David Rowley  wrote:
> > On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada,  > wrote:
>
> > It works but I think we might want to add the unit kB for
> > understandability and consistency with other GUC_UNIT_KB parameters.
> > I've attached a small patch that adds the unit and aligns the indent
> > of the comments to the perimeter parameters.
> >
> > I'm not currently able to check, but if work_mem has a unit in the
> sample conf then I agree that vacuum_buffer_usage_limit should too.
>
> +1 work_mem and all other related options in this section has a unit in the
> sample conf so adding this makes sense.
>

Agreed.
for the patch, the other GUCs have a tab instead of a space between the
unit and the "#" of the first comment.
(not the fault of this patch but probably makes sense to fix now).
Otherwise, LGTM

- Melanie


Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2023-04-26 Thread Aleksander Alekseev
Hi,

> I agree that we can't simply widen varatt_external to use 8 bytes for
> the toast ID in all cases.

+1

Note that the user may have a table with multiple TOASTable
attributes. If we simply widen the TOAST pointer it may break the
existing tables in the edge case. Also this may be a reason why
certain users may prefer to continue using narrow pointers. IMO wide
TOAST pointers should be a table option. Whether the default for new
tables should be wide or narrow pointers is debatable.

In another discussion [1] we seem to agree that we also want to have
an ability to include a 32-bit dictionary_id to the TOAST pointers and
perhaps support more compression methods (ZSTD to name one). Besides
that it would be nice to have an ability to extend TOAST pointers in
the future without breaking the existing pointers. One possible
solution would be to add a varint feature bitmask to every pointer. So
we could add flags like TOAST_IS_WIDE, TOAST_HAS_DICTIONARY,
TOAST_UNKNOWN_FEATURE_FROM_2077, etc indefinitely.

I suggest we address all the current and future needs once and
completely refactor TOAST pointers rather than solving one problem at
a time. I believe this will be more beneficial for the community in
the long term.

Thoughts?

[1]: https://commitfest.postgresql.org/43/3626/

-- 
Best regards,
Aleksander Alekseev




Re: pg_stat_io for the startup process

2023-04-26 Thread Robert Haas
On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
 wrote:
> 3. When should we call pgstat_report_stats on the startup process?
>
> During recovery, I think we can call pgstat_report_stats() (or a
> subset of it) right before invoking WaitLatch and at segment
> boundaries.

I think this kind of idea is worth exploring. Andres mentioned timers,
but it seems to me that something where we just do it at certain
"convenient" points might be good enough and simpler. I'd much rather
have statistics that were up to date as of the last time we finished a
segment than have nothing at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-26 Thread Daniel Gustafsson
> On 26 Apr 2023, at 13:26, David Rowley  wrote:
> On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada,  > wrote:

> It works but I think we might want to add the unit kB for
> understandability and consistency with other GUC_UNIT_KB parameters.
> I've attached a small patch that adds the unit and aligns the indent
> of the comments to the perimeter parameters.
> 
> I'm not currently able to check, but if work_mem has a unit in the sample 
> conf then I agree that vacuum_buffer_usage_limit should too.

+1 work_mem and all other related options in this section has a unit in the
sample conf so adding this makes sense.

--
Daniel Gustafsson





Re: Find dangling membership roles in pg_dumpall

2023-04-26 Thread Daniel Gustafsson
> On 26 Apr 2023, at 13:02, Daniel Gustafsson  wrote:
>> On 26 Apr 2023, at 12:18, Andreas 'ads' Scherbaum  wrote:

>> The attached patch fixes this problem, and updates prev_remaining inside
>> the loop.
> 
> Nice catch, that indeed seems like a proper fix.  This was introduced in
> ce6b672e44 and so doesn't need a backpatch.

Pushed, but without the comment extension as I felt the existing comment aptly
discussed the codepath. Thanks!

--
Daniel Gustafsson





Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-04-26 Thread Aleksander Alekseev
Hi,

> You're absolutely right. Here's v3.

Please avoid using top posting [1].

The commit message may require a bit of tweaking by the committer but
other than that the patch seems to be fine. I'm going to mark it as
RfC in a bit unless anyone objects.

[1]: https://wiki.postgresql.org/wiki/Mailing_Lists#Email_etiquette_mechanics

-- 
Best regards,
Aleksander Alekseev




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-26 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

> A suggestion: You could write some/most tests against test_decoding
> rather than the publication/subscription system.  That way, you can
> avoid many timing issues in the tests and you can check more exactly
> that the slots produce the output you want.  This would also help ensure
> that this new facility works for other logical decoding output plugins
> besides the built-in one.

Good point. I think almost tests except --check part can be rewritten.
PSA new patchset.

Additionally, I fixed followings:

- Added initialization for slot_arr.*. This is needed to check whether 
  the entry has already been allocated, in get_logical_slot_infos().
  Previously double-free was occurred in some platform.
- fixed condition in get_logical_slot_infos()
- Changed the expected size of page header to longer one(SizeOfXLogLongPHD).
  If the WAL page is the first one in the WAL segment file, the long header 
seems
  to be used.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v10-0001-pg_upgrade-Add-include-logical-replication-slots.patch
Description:  v10-0001-pg_upgrade-Add-include-logical-replication-slots.patch


v10-0002-Always-persist-to-disk-logical-slots-during-a-sh.patch
Description:  v10-0002-Always-persist-to-disk-logical-slots-during-a-sh.patch


v10-0003-pg_upgrade-Add-check-function-for-include-logica.patch
Description:  v10-0003-pg_upgrade-Add-check-function-for-include-logica.patch


v10-0004-Change-the-method-used-to-check-logical-replicat.patch
Description:  v10-0004-Change-the-method-used-to-check-logical-replicat.patch


Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-04-26 Thread David Rowley
On Sun, 23 Apr 2023, 3:42 am Gurjeet Singh,  wrote:

> I anticipate that edits to Appendix K Postgres Limits will prompt
> improving the note in there about the maximum column limit, That note
> is too wordy, and sometimes confusing, especially for the audience
> that it's written for: newcomers to Postgres ecosystem.
>

I doubt it, but feel free to submit a patch yourself which improves it
without losing the information which the paragraph is trying to convey.

David

>


RE: Support logical replication of DDLs

2023-04-26 Thread Zhijie Hou (Fujitsu)
On Wednesday, April 26, 2023 2:32 PM Masahiko Sawada 
> Subject: Re: Support logical replication of DDLs
> 
> On Tue, Mar 28, 2023 at 3:22 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 28, 2023 1:41 PM Amit Kapila 
> wrote:
> > >
> > > On Mon, Mar 27, 2023 at 5:37 PM Amit Kapila 
> > > wrote:
> > > >
> > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila
> 
> > > wrote:
> > > > >
> > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane 
> wrote:
> > > > > >
> > > > >
> > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > about the security issues, and about how we could get to a
> committable
> > > patch.
> > > > > >
> > > > >
> > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > this and share my thoughts on the same in a separate email.
> > > > >
> > > >
> > > > The idea to control what could be replicated is to introduce a new
> > > > publication option 'ddl' along with current options 'publish' and
> > > > 'publish_via_partition_root'. The values of this new option could be
> > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > all supported DDL commands. Example usage for this would be:
> > > > Example:
> > > > Create a new publication with all ddl replication enabled:
> > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > >
> > > > Enable table ddl replication for an existing Publication:
> > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > >
> > > > This is what seems to have been discussed but I think we can even
> > > > extend it to support based on operations/commands, say one would like
> > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > existing publish option to have values like 'create', 'alter', and
> > > > 'drop'.
> > > >
> > >
> > > The other idea could be to that for the new option ddl, we input command
> tags
> > > such that the replication will happen for those commands.
> > > For example, ALTER PUBLICATION pub2 SET (ddl = 'Create Table, Alter
> > > Table, ..'); This will obviate the need to have additional values like 
> > > 'create',
> 'alter',
> > > and 'drop' for publish option.
> > >
> > > The other thought related to filtering is that one might want to filter 
> > > DDLs
> and
> > > or DMLs performed by specific roles in the future. So, we then need to
> > > introduce another option ddl_role, or something like that.
> > >
> > > Can we think of some other kind of filter for DDL replication?
> >
> > I am thinking another generic syntax for ddl replication like:
> >
> > --
> > CREATE PUBLICATION pubname FOR object_type object_name with (publish
> = 'ddl_type');
> > --
> >
> > To replicate DDLs that happened on a table, we don't need to add new syntax
> or
> > option, we can extend the value for the 'publish' option like:
> >
> > To support more non-table objects replication, we can follow the same style
> and write it like:
> > --
> > CRAETE PUBLICATION FOR FUNCTION f1 with (publish = 'alter'); -- function
> > CRAETE PUBLICATION FOR ALL OPERATORS IN SCHEMA op_schema with
> (publish = 'drop'); -- operators
> > CRAETE PUBLICATION FOR ALL OBJECTS with (publish = 'alter, create, drop');
> -- everything
> > --
> >
> > In this approach, we extend the publication grammar and users can
> > filter the object schema, object name, object type and ddltype. We can also
> add
> > more options to filter role or other infos in the future.
> 
> In this approach, does the subscriber need to track what objects have
> been subscribed similar to tables? For example, suppose that we
> created a publication for function func1 and created a subscription
> for the publication. What if we add function func2 to the publication?
> If we follow the current behavior, DDLs for func2 will be replicated
> to the subscriber but the subscriber won't apply it unless we refresh
> the publication of the subscription. So it seems to me that the
> subscriber needs to have a list of subscribed functions, and we will
> end up having lists of all types of objects.

Yes, If we follow the current behavior, we need to have lists of all types of
objects on subscriber.

But I think even if we only support replicating all DDLs,
when doing initial sync for FUNCTIONS, we still need to skip
the ALTER FUNCTION for the functions that is being synced(creating).

Best Regards,
Hou zj


Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-26 Thread David Rowley
On Wed, 26 Apr 2023, 8:48 pm Masahiko Sawada,  wrote:

> I realized that the value of vacuum_buffer_usage_limit parameter in
> postgresql.conf.sample doesn't have the unit:
>
> #vacuum_buffer_usage_limit = 256 # size of vacuum and analyze buffer
> access strategy ring.
>  # 0 to disable vacuum buffer access
> strategy
>  # range 128kB to 16GB
>
> It works but I think we might want to add the unit kB for
> understandability and consistency with other GUC_UNIT_KB parameters.
> I've attached a small patch that adds the unit and aligns the indent
> of the comments to the perimeter parameters.
>

I'm not currently able to check, but if work_mem has a unit in the sample
conf then I agree that vacuum_buffer_usage_limit should too.

I'm fine for you to go ahead and adjust this, otherwise it'll be Monday
before I can.

David

>


Re: Perform streaming logical transactions by background workers and parallel apply

2023-04-26 Thread Amit Kapila
On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin  
> wrote:
>
> Thanks for reporting the issue.
>
> I think the problem is that it tried to release locks in
> logicalrep_worker_onexit() before the initialization of the process is 
> complete
> because this callback function was registered before the init phase. So I 
> think we
> can add a conditional statement before releasing locks. Please find an 
> attached
> patch.
>

Yeah, this should work. Yet another possibility is to introduce a new
variable 'InitializingApplyWorker' similar to
'InitializingParallelWorker' and use that to prevent releasing locks.

-- 
With Regards,
Amit Kapila.




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/26/23 11:12 AM, vignesh C wrote:

On Wed, 26 Apr 2023 at 13:45, Drouvot, Bertrand

There was one typo in the commit message, subscribtion should be
subscription, the rest of the changes looks good to me:
Subject: [PATCH v5] Add subscribtion to the standby test in
  035_standby_logical_decoding.pl

Adding one test, to verify that subscribtion to the standby is possible.



Oops, at least I repeated it twice ;-)
Fixed in V6 that I just shared up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/26/23 12:27 PM, Alvaro Herrera wrote:

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..819667d42a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2644,7 +2644,16 @@ sub wait_for_catchup
}
if (!defined($target_lsn))
{
-   $target_lsn = $self->lsn('write');
+   my $isrecovery = $self->safe_psql('postgres', "SELECT 
pg_is_in_recovery()");
+   chomp($isrecovery);
+   if ($isrecovery eq 't')
+   {
+   $target_lsn = $self->lsn('replay');
+   }
+   else
+   {
+   $target_lsn = $self->lsn('write');
+   }


Please modify the function's documentation to account for this code change.



Good point, thanks! Done in V6 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 2a8b8bdc32671b33b2b7c0fa1a79f8d7624ae4a6 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 24 Apr 2023 05:13:23 +
Subject: [PATCH v6] Add subscription to the standby test in
 035_standby_logical_decoding.pl

Adding one test, to verify that subscription to the standby is possible.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 21 -
 .../t/035_standby_logical_decoding.pl | 91 ++-
 2 files changed, 107 insertions(+), 5 deletions(-)
  18.8% src/test/perl/PostgreSQL/Test/
  81.1% src/test/recovery/t/

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..9117554c07 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2611,8 +2611,14 @@ When doing physical replication, the standby is usually 
identified by
 passing its PostgreSQL::Test::Cluster instance.  When doing logical
 replication, standby_name identifies a subscription.
 
-The default value of target_lsn is $node->lsn('write'), which ensures
-that the standby has caught up to what has been committed on the primary.
+When not in recovery, the default value of target_lsn is $node->lsn('write'),
+which ensures that the standby has caught up to what has been committed on
+the primary.
+
+When in recovery, the default value of target_lsn is $node->lsn('replay')
+instead. This is needed when the publisher passed to 
wait_for_subscription_sync()
+is a standby node.
+
 If you pass an explicit value of target_lsn, it should almost always be
 the primary's write LSN; so this parameter is seldom needed except when
 querying some intermediate replication node rather than the primary.
@@ -2644,7 +2650,16 @@ sub wait_for_catchup
}
if (!defined($target_lsn))
{
-   $target_lsn = $self->lsn('write');
+   my $isrecovery = $self->safe_psql('postgres', "SELECT 
pg_is_in_recovery()");
+   chomp($isrecovery);
+   if ($isrecovery eq 't')
+   {
+   $target_lsn = $self->lsn('replay');
+   }
+   else
+   {
+   $target_lsn = $self->lsn('write');
+   }
}
print "Waiting for replication conn "
  . $standby_name . "'s "
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index b8f5311fe9..f6d6447412 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,12 +10,17 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
+my ($stdin, $stdout,$stderr,
+   $cascading_stdout,  $cascading_stderr,  $subscriber_stdin,
+   $subscriber_stdout, $subscriber_stderr, $ret,
+   $handle,$slot);
 
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
 my $node_cascading_standby = 
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout= IPC::Run::timer($default_timeout);
 my $res;
 
 # Name for the physical slot on primary
@@ -267,7 +272,8 @@ $node_standby->init_from_backup(
has_streaming => 1,
has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
-   qq[primary_slot_name = '$primary_slotname']);
+   qq[primary_slot_name = '$primary_slotname'
+   max_replication_slots = 5]);
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
 $node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slo

Re: Find dangling membership roles in pg_dumpall

2023-04-26 Thread Daniel Gustafsson
> On 26 Apr 2023, at 12:18, Andreas 'ads' Scherbaum  wrote:
> 
> 
> Hello,
> 
> pg_dumpall.c has a function dumpRoleMembership() which dumps all
> membership roles. This function includes a piece of code which checks
> if the membership tree has an open end which can't be resolved.
> However that code is never used.
> 
> The variable prev_remaining is initially set to 0, and then never changed.
> Which in turn never executes this code:
> 
> if (remaining == prev_remaining)
> 
> because the loop is only entered while "remaining > 0".
> 
> The attached patch fixes this problem, and updates prev_remaining inside
> the loop.

Nice catch, that indeed seems like a proper fix.  This was introduced in
ce6b672e44 and so doesn't need a backpatch.

--
Daniel Gustafsson





RE: Perform streaming logical transactions by background workers and parallel apply

2023-04-26 Thread Zhijie Hou (Fujitsu)
On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin  
wrote:
> Please look at a new anomaly that can be observed starting from 216a7848.
> 
> The following script:
> echo "CREATE SUBSCRIPTION testsub CONNECTION 'dbname=nodb'
> PUBLICATION testpub WITH (connect = false);
> ALTER SUBSCRIPTION testsub ENABLE;" | psql
> 
> sleep 1
> rm $PGINST/lib/libpqwalreceiver.so
> sleep 15
> pg_ctl -D "$PGDB" stop -m immediate
> grep 'TRAP:' server.log
> 
> Leads to multiple assertion failures:
> CREATE SUBSCRIPTION
> ALTER SUBSCRIPTION
> waiting for server to shut down done
> server stopped
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899323
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899416
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899427
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899439
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899538
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899547
> 
> server.log contains:
> 2023-04-26 11:00:58.797 MSK [2899300] LOG:  database system is ready to
> accept connections
> 2023-04-26 11:00:58.821 MSK [2899416] ERROR:  could not access file
> "libpqwalreceiver": No such file or directory
> TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c",
> Line: 4439, PID: 2899416
> postgres: logical replication apply worker for subscription 16385
> (ExceptionalCondition+0x69)[0x558b2ac06d41]
> postgres: logical replication apply worker for subscription 16385
> (VirtualXactLockTableCleanup+0xa4)[0x558b2aa9fd74]
> postgres: logical replication apply worker for subscription 16385
> (LockReleaseAll+0xbb)[0x558b2aa9fe7d]
> postgres: logical replication apply worker for subscription 16385
> (+0x4588c6)[0x558b2aa2a8c6]
> postgres: logical replication apply worker for subscription 16385
> (shmem_exit+0x6c)[0x558b2aa87eb1]
> postgres: logical replication apply worker for subscription 16385
> (+0x4b5faa)[0x558b2aa87faa]
> postgres: logical replication apply worker for subscription 16385
> (proc_exit+0xc)[0x558b2aa88031]
> postgres: logical replication apply worker for subscription 16385
> (StartBackgroundWorker+0x147)[0x558b2aa0b4d9]
> postgres: logical replication apply worker for subscription 16385
> (+0x43fdc1)[0x558b2aa11dc1]
> postgres: logical replication apply worker for subscription 16385
> (+0x43ff3d)[0x558b2aa11f3d]
> postgres: logical replication apply worker for subscription 16385
> (+0x440866)[0x558b2aa12866]
> postgres: logical replication apply worker for subscription 16385
> (+0x440e12)[0x558b2aa12e12]
> postgres: logical replication apply worker for subscription 16385
> (BackgroundWorkerInitializeConnection+0x0)[0x558b2aa14396]
> postgres: logical replication apply worker for subscription 16385
> (main+0x21a)[0x558b2a932e21]
> 
> I understand, that removing libpqwalreceiver.so (or whole pginst/) is not
> what happens in a production environment every day, but nonetheless it's a
> new failure mode and it can produce many coredumps when testing.
> 
> IIUC, that assert will fail in case of any error raised between
> ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and
> ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC
> onnectionByOid()->InitPostgres().

Thanks for reporting the issue.

I think the problem is that it tried to release locks in
logicalrep_worker_onexit() before the initialization of the process is complete
because this callback function was registered before the init phase. So I think 
we
can add a conditional statement before releasing locks. Please find an attached
patch.

Best Regards,
Hou zj



0001-fix-assert-failure-in-logical-replication-apply-work.patch
Description:  0001-fix-assert-failure-in-logical-replication-apply-work.patch


Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Alvaro Herrera
> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
> b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 6f7f4e5de4..819667d42a 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -2644,7 +2644,16 @@ sub wait_for_catchup
>   }
>   if (!defined($target_lsn))
>   {
> - $target_lsn = $self->lsn('write');
> + my $isrecovery = $self->safe_psql('postgres', "SELECT 
> pg_is_in_recovery()");
> + chomp($isrecovery);
> + if ($isrecovery eq 't')
> + {
> + $target_lsn = $self->lsn('replay');
> + }
> + else
> + {
> + $target_lsn = $self->lsn('write');
> + }

Please modify the function's documentation to account for this code change.


-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Porque Kim no hacía nada, pero, eso sí,
con extraordinario éxito" ("Kim", Kipling)




Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns

2023-04-26 Thread Jakub Wartak
Hi,

>> These 2 discussions show that it's a painful experience to run into
>> this problem, and that the hackers have ideas on how to fix it, but
>> those fixes haven't materialized for years. So I would say that, yes,
>> this info belongs in the hard-limits section, because who knows how
>> long it'll take this to be fixed.
>>
>> Please submit a patch.
>>
> This is a production case for large databases with high update rates, but is 
> mistaken
> with reaching table size limit, although size limit is processed correctly.
>
> The note on TOAST limitation does not mention that TOAST values are not 
> actually
> updated on UPDATE operation - old value is marked as dead and new one is 
> inserted,
> and dead values should be vacuumed before value OID could be reused. The worst
> is that the INSERT/UPDATE clause does not fail if there is no OID available - 
> it is
> looped in an infinite loop of sorting out OIDs.

OK, so here is the documentation patch proposal. I've also added two
rows touching the subject of pg_largeobjects, as it is also related to
the OIDs topic. Please feel free to send adjusted patches.

Regards,
-J.


v1-0001-doc-Add-some-OID-TOAST-related-limitations-to-the.patch
Description: Binary data


Find dangling membership roles in pg_dumpall

2023-04-26 Thread Andreas 'ads' Scherbaum


Hello,

pg_dumpall.c has a function dumpRoleMembership() which dumps all
membership roles. This function includes a piece of code which checks
if the membership tree has an open end which can't be resolved.
However that code is never used.

The variable prev_remaining is initially set to 0, and then never changed.
Which in turn never executes this code:

if (remaining == prev_remaining)

because the loop is only entered while "remaining > 0".

The attached patch fixes this problem, and updates prev_remaining inside
the loop.


Co-Author: Artur Zakirov 
who reviewed the patch.


Regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 71a1319865..1e2a32c39a 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1044,6 +1044,9 @@ dumpRoleMembership(PGconn *conn)
 			 * graph whose vertices are grants and whose edges point from
 			 * grantors to members should be connected and acyclic. If we fail
 			 * to make progress, either we or the server have messed up.
+			 * Initially $prev_remaining is 0, while initially $remaining can't
+			 * be 0. If a subsequent loop produces no new results the loop
+			 * aborts here.
 			 */
 			if (remaining == prev_remaining)
 			{
@@ -1052,6 +1055,7 @@ dumpRoleMembership(PGconn *conn)
 PQfinish(conn);
 exit_nicely(1);
 			}
+			prev_remaining = remaining;
 
 			/* Make one pass over the grants for this role. */
 			for (i = start; i < end; ++i)


RE: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Yu Shi (Fujitsu)
On Mon, Apr 24, 2023 8:07 PM Drouvot, Bertrand  
wrote:
> 
> On 4/24/23 11:45 AM, Amit Kapila wrote:
> > On Mon, Apr 24, 2023 at 11:54 AM Amit Kapila 
> wrote:
> >>
> >> On Mon, Apr 24, 2023 at 11:24 AM Drouvot, Bertrand
> >>  wrote:
> >>>
> >>
> >> Few comments:
> >> 
> >>
> >
> > +# We can not test if the WAL file still exists immediately.
> > +# We need to let some time to the standby to actually "remove" it.
> > +my $i = 0;
> > +while (1)
> > +{
> > + last if !-f $standby_walfile;
> > + if ($i++ == 10 * $default_timeout)
> > + {
> > + die
> > +   "could not determine if WAL file has been retained or not, can't 
> > continue";
> > + }
> > + usleep(100_000);
> > +}
> >
> > Is this adhoc wait required because we can't guarantee that the
> > checkpoint is complete on standby even after using wait_for_catchup?
> 
> Yes, the restart point on the standby is not necessary completed even after
> wait_for_catchup is done.
> 

I think that's because when replaying a checkpoint record, the startup process
of standby only saves the information of the checkpoint, and we need to wait for
the checkpointer to perform a restartpoint (see RecoveryRestartPoint), right? If
so, could we force a checkpoint on standby? After this, the standby should have
completed the restartpoint and we don't need to wait.

Besides, would it be better to wait for the cascading standby? If the wal log
file needed for cascading standby is removed on the standby, the subsequent test
will fail. Do we need to consider this scenario? I saw the following error
message after setting recovery_min_apply_delay to 5s on the cascading standby,
and the test failed due to a timeout while waiting for cascading standby.

Log of cascading standby node:
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 
00010003 has already been removed

Regards,
Shi Yu


Re: pg_stat_io for the startup process

2023-04-26 Thread Kyotaro Horiguchi
At Tue, 25 Apr 2023 16:04:23 -0700, Andres Freund  wrote in 
> I refreshed my memory: The startup process has indeed behaved that way for
> much longer than pg_stat_io existed - but it's harder to spot, because the
> stats are more coarsely aggregated :/. And it's very oddly inconsistent:
> 
> The startup process doesn't report per-relation read/hit (it might when we
> create a fake relcache entry, to lazy to see what happens exactly), because we

The key difference lies between relation-level and smgr-level;
recovery doesn't call ReadBufferExtended.

> key those stats by oid. However, it *does* report the read/write time. But
> only at process exit, of course.  The weird part is that the startup process
> does *NOT* increase pg_stat_database.blks_read/blks_hit, because instead of
> basing those on pgBufferUsage.shared_blks_read etc, we compute them based on
> the relation level stats. pgBufferUsage is just used for EXPLAIN.  This isn't
> recent, afaict.

I see four issues here.

1. The current database stats omits buffer fetches that don't
  originate from a relation.

In this case pgstat_relation can't work since recovery isn't conscious
of relids.  We might be able to resolve relfilenode into a relid, but
it may not be that simple. Fortunately we already count fetches and
hits process-wide using pgBufferUsage, so we can use this for database
stats.

2. Even if we wanted to report stats for the startup process,
  pgstat_report_stats wouldn't permit it since transaction-end
  timestamp doesn't advance.

I'm not certain if it's the correct approach, but perhaps we could use
GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp()
specifically for the startup process.

3. When should we call pgstat_report_stats on the startup process?

During recovery, I think we can call pgstat_report_stats() (or a
subset of it) right before invoking WaitLatch and at segment
boundaries.

4. In the existing ReadBuffer_common, there's an inconsistency in
counting hits and reads between pgstat_io and pgBufferUsage.

The difference comes from the case of RBM_ZERO pages. We should simply
align them.


> TL;DR: Currently the startup process maintains blk_read_time, blk_write_time,
> but doesn't maintain blks_read, blks_hit - which doesn't make sense.

As a result, the attached patch, which is meant for discussion, allows
pg_stat_database to show fetches and reads by the startup process as
the counts for the database with id 0.

There's still some difference between pg_stat_io and pg_stat_database,
but I haven't examined it in detail.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index 188f6d6f85..6010ae8662 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3248,6 +3248,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr 
targetPagePtr, int reqLen,
close(readFile);
readFile = -1;
readSource = XLOG_FROM_ANY;
+   pgstat_report_stat(false);
}
 
XLByteToSeg(targetPagePtr, readSegNo, wal_segment_size);
@@ -3617,6 +3618,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
/* Do background tasks that 
might benefit us later. */

KnownAssignedTransactionIdsIdleMaintenance();
 
+   pgstat_report_stat(false);
(void) 
WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,

 WL_LATCH_SET | WL_TIMEOUT |

 WL_EXIT_ON_PM_DEATH,
@@ -3893,6 +3895,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
randAccess,
 * Wait for more WAL to arrive, when we 
will be woken
 * immediately by the WAL receiver.
 */
+   pgstat_report_stat(false);
(void) 
WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
 
WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
 -1L,
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 1fa689052e..520936d0dd 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1046,9 +1046,6 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, 
ForkNumber forkNum,
bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
if (found)
pgBufferUsage.local_blks_hit++

Re: Support logical replication of DDLs

2023-04-26 Thread vignesh C
On Wed, 26 Apr 2023 at 12:02, Masahiko Sawada  wrote:
>
> On Tue, Mar 28, 2023 at 3:22 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 28, 2023 1:41 PM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Mar 27, 2023 at 5:37 PM Amit Kapila 
> > > wrote:
> > > >
> > > > On Mon, Mar 27, 2023 at 12:07 PM Amit Kapila 
> > > wrote:
> > > > >
> > > > > On Mon, Mar 27, 2023 at 2:52 AM Tom Lane  wrote:
> > > > > >
> > > > >
> > > > > > I suggest taking a couple of steps back from the minutiae of the
> > > > > > patch, and spending some hard effort thinking about how the thing
> > > > > > would be controlled in a useful fashion (that is, a real design
> > > > > > for the filtering that was mentioned at the very outset), and
> > > > > > about the security issues, and about how we could get to a 
> > > > > > committable
> > > patch.
> > > > > >
> > > > >
> > > > > Agreed. I'll try to summarize the discussion we have till now on
> > > > > this and share my thoughts on the same in a separate email.
> > > > >
> > > >
> > > > The idea to control what could be replicated is to introduce a new
> > > > publication option 'ddl' along with current options 'publish' and
> > > > 'publish_via_partition_root'. The values of this new option could be
> > > > 'table', 'function', 'all', etc. Here 'all' enables the replication of
> > > > all supported DDL commands. Example usage for this would be:
> > > > Example:
> > > > Create a new publication with all ddl replication enabled:
> > > >   CREATE PUBLICATION pub1 FOR ALL TABLES with (ddl = 'all');
> > > >
> > > > Enable table ddl replication for an existing Publication:
> > > >   ALTER PUBLICATION pub2 SET (ddl = 'table');
> > > >
> > > > This is what seems to have been discussed but I think we can even
> > > > extend it to support based on operations/commands, say one would like
> > > > to publish only 'create' and 'drop' of tables. Then we can extend the
> > > > existing publish option to have values like 'create', 'alter', and
> > > > 'drop'.
> > > >
> > >
> > > The other idea could be to that for the new option ddl, we input command 
> > > tags
> > > such that the replication will happen for those commands.
> > > For example, ALTER PUBLICATION pub2 SET (ddl = 'Create Table, Alter
> > > Table, ..'); This will obviate the need to have additional values like 
> > > 'create', 'alter',
> > > and 'drop' for publish option.
> > >
> > > The other thought related to filtering is that one might want to filter 
> > > DDLs and
> > > or DMLs performed by specific roles in the future. So, we then need to
> > > introduce another option ddl_role, or something like that.
> > >
> > > Can we think of some other kind of filter for DDL replication?
> >
> > I am thinking another generic syntax for ddl replication like:
> >
> > --
> > CREATE PUBLICATION pubname FOR object_type object_name with (publish = 
> > 'ddl_type');
> > --
> >
> > To replicate DDLs that happened on a table, we don't need to add new syntax 
> > or
> > option, we can extend the value for the 'publish' option like:
> >
> > To support more non-table objects replication, we can follow the same style 
> > and write it like:
> > --
> > CRAETE PUBLICATION FOR FUNCTION f1 with (publish = 'alter'); -- function
> > CRAETE PUBLICATION FOR ALL OPERATORS IN SCHEMA op_schema with (publish = 
> > 'drop'); -- operators
> > CRAETE PUBLICATION FOR ALL OBJECTS with (publish = 'alter, create, drop'); 
> > -- everything
> > --
> >
> > In this approach, we extend the publication grammar and users can
> > filter the object schema, object name, object type and ddltype. We can also 
> > add
> > more options to filter role or other infos in the future.
>
> In this approach, does the subscriber need to track what objects have
> been subscribed similar to tables? For example, suppose that we
> created a publication for function func1 and created a subscription
> for the publication. What if we add function func2 to the publication?
> If we follow the current behavior, DDLs for func2 will be replicated
> to the subscriber but the subscriber won't apply it unless we refresh
> the publication of the subscription. So it seems to me that the
> subscriber needs to have a list of subscribed functions, and we will
> end up having lists of all types of objects.
>
> >
> > 
> >
> > One more alternative could be like:
> >
> > One more alternative could be like:
> > CREATE PUBLICATION xx FOR pub_create_alter_table WITH (ddl = 
> > 'table:create,alter'); -- it will publish create table and alter table 
> > operations.
> > CREATE PUBLICATION xx FOR pub_all_table WITH (ddl = 'table:all'); -- This 
> > means all table operations create/alter/drop
> > CREATE PUBLICATION xx FOR pub_all_table WITH (ddl = 'table'); -- same as 
> > above
> >
> > This can be extended later to:
> > CREATE PUBLICATION xx FOR pub_all_func WITH (ddl = 'function:all');
> > CREATE PUBLICATION xx FOR pub_create_trigger (ddl = 'trigger:create');
> >
> > In this approach, we don't need to add mo

Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread vignesh C
On Wed, 26 Apr 2023 at 13:45, Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 4/26/23 6:06 AM, vignesh C wrote:
> > On Tue, 25 Apr 2023 at 12:51, Drouvot, Bertrand
> >  wrote:
> > Thanks for the updated patch.
> > Few comments:
>
> Thanks for looking at it!
>
> > 1) subscriber_stdout and  subscriber_stderr are not required for this
> > test case, we could remove it, I was able to remove those variables
> > and run the test successfully:
> > +$node_subscriber->start;
> > +
> > +my %psql_subscriber = (
> > +   'subscriber_stdin'  => '',
> > +   'subscriber_stdout' => '',
> > +   'subscriber_stderr' => '');
> > +$psql_subscriber{run} = IPC::Run::start(
> > +   [ 'psql', '-XA', '-f', '-', '-d',
> > $node_subscriber->connstr('postgres') ],
> > +   '<',
> > +   \$psql_subscriber{subscriber_stdin},
> > +   '>',
> > +   \$psql_subscriber{subscriber_stdout},
> > +   '2>',
> > +   \$psql_subscriber{subscriber_stderr},
> > +   $psql_timeout);
> >
> > I ran it like:
> > my %psql_subscriber = (
> > 'subscriber_stdin' => '');
> > $psql_subscriber{run} = IPC::Run::start(
> > [ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') ],
> > '<',
> > \$psql_subscriber{subscriber_stdin},
> > $psql_timeout);
> >
>
> Not using the 3 std* is also the case for example in 021_row_visibility.pl 
> and 032_relfilenode_reuse.pl
> where the "stderr" is set but does not seem to be used.
>
> I don't think that's a problem to keep them all and I think it's better to 
> have
> them re-directed to dedicated places.

ok, that way it will be consistent across others too.

> > 2) Also we have changed the default timeout here, why is this change 
> > required:
> >   my $node_cascading_standby =
> > PostgreSQL::Test::Cluster->new('cascading_standby');
> > +my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
> >   my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
> > +my $psql_timeout= IPC::Run::timer(2 * $default_timeout);
>
> I think I used 021_row_visibility.pl as an example. But agree there is
> others .pl that are using the timeout_default as the psql_timeout and that
> the default is enough in our case. So, using the default in V5 attached.
>

Thanks for fixing this.

There was one typo in the commit message, subscribtion should be
subscription, the rest of the changes looks good to me:
Subject: [PATCH v5] Add subscribtion to the standby test in
 035_standby_logical_decoding.pl

Adding one test, to verify that subscribtion to the standby is possible.

Regards,
Vignesh




Re: Perform streaming logical transactions by background workers and parallel apply

2023-04-26 Thread Alexander Lakhin

Hello hackers,

Please look at a new anomaly that can be observed starting from 216a7848.

The following script:
echo "CREATE SUBSCRIPTION testsub CONNECTION 'dbname=nodb' PUBLICATION testpub 
WITH (connect = false);
ALTER SUBSCRIPTION testsub ENABLE;" | psql

sleep 1
rm $PGINST/lib/libpqwalreceiver.so
sleep 15
pg_ctl -D "$PGDB" stop -m immediate
grep 'TRAP:' server.log

Leads to multiple assertion failures:
CREATE SUBSCRIPTION
ALTER SUBSCRIPTION
waiting for server to shut down done
server stopped
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899323
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899416
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899427
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899439
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899538
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899547

server.log contains:
2023-04-26 11:00:58.797 MSK [2899300] LOG:  database system is ready to accept 
connections
2023-04-26 11:00:58.821 MSK [2899416] ERROR:  could not access file 
"libpqwalreceiver": No such file or directory
TRAP: failed Assert("MyProc->backendId != InvalidBackendId"), File: "lock.c", 
Line: 4439, PID: 2899416
postgres: logical replication apply worker for subscription 16385 
(ExceptionalCondition+0x69)[0x558b2ac06d41]
postgres: logical replication apply worker for subscription 16385 
(VirtualXactLockTableCleanup+0xa4)[0x558b2aa9fd74]
postgres: logical replication apply worker for subscription 16385 
(LockReleaseAll+0xbb)[0x558b2aa9fe7d]
postgres: logical replication apply worker for subscription 16385 
(+0x4588c6)[0x558b2aa2a8c6]
postgres: logical replication apply worker for subscription 16385 
(shmem_exit+0x6c)[0x558b2aa87eb1]
postgres: logical replication apply worker for subscription 16385 
(+0x4b5faa)[0x558b2aa87faa]
postgres: logical replication apply worker for subscription 16385 
(proc_exit+0xc)[0x558b2aa88031]
postgres: logical replication apply worker for subscription 16385 
(StartBackgroundWorker+0x147)[0x558b2aa0b4d9]
postgres: logical replication apply worker for subscription 16385 
(+0x43fdc1)[0x558b2aa11dc1]
postgres: logical replication apply worker for subscription 16385 
(+0x43ff3d)[0x558b2aa11f3d]
postgres: logical replication apply worker for subscription 16385 
(+0x440866)[0x558b2aa12866]
postgres: logical replication apply worker for subscription 16385 
(+0x440e12)[0x558b2aa12e12]
postgres: logical replication apply worker for subscription 16385 
(BackgroundWorkerInitializeConnection+0x0)[0x558b2aa14396]
postgres: logical replication apply worker for subscription 16385 
(main+0x21a)[0x558b2a932e21]

I understand, that removing libpqwalreceiver.so (or whole pginst/) is not
what happens in a production environment every day, but nonetheless it's a
new failure mode and it can produce many coredumps when testing.

IIUC, that assert will fail in case of any error raised between
ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and
ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeConnectionByOid()->InitPostgres().

Best regards,
Alexander




Re: Support logical replication of DDLs

2023-04-26 Thread Amit Kapila
On Wed, Apr 26, 2023 at 12:01 PM Masahiko Sawada  wrote:
>
> On Wed, Apr 26, 2023 at 2:56 PM Amit Kapila  wrote:
> >
> > On Wed, Apr 26, 2023 at 10:01 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Apr 25, 2023 at 12:58 PM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > Aport from above comments, I splitted the code related to verbose
> > > > mode to a separate patch. And here is the new version patch set.
> > > >
> > >
> > > As for DDL replication, we create event triggers to write deparsed DDL
> > > commands to WAL when creating a publication with the ddl option. The
> > > event triggers are recreated/dropped at ALTER/DROP PUBLICATION. I'm
> > > concerned it's possible that DDLs executed while such a publication
> > > not existing are not replicated. For example, imagine the following
> > > steps,
> > >
> > > 1. CREATE PUBLICATION test_pub ... WITH (ddl = 'table);
> > > 2. CREATE SUBSCRIPTION test_sub ... PUBLICATION test_pub;
> > > 3. ALTER SUBSCRIPTION test_sub DISABLE;
> > > 4. DROP PUBLICATION test_pub;
> > > 5. CREATE PUBLICATION test_pub ... WITH (ddl = 'table);
> > > 6. ALTER SUBSCRIPTION test_sub ENABLE;
> > >
> > > DDLs executed between 4 and 5 won't be replicated.
> > >
> >
> > But we won't even send any DMLs between 4 and 5. In fact, WALSender
> > will give an error for those DMLs that publication doesn't exist as it
> > uses a historic snapshot.
>
> You're right, I missed this point.
>
> > So, why do we expect DDLs between Drop and
> > Create of publication should be replicated?
>
> For example, suppose that a publication is created for a table and
> then a new column is added to the table between 4 and 5, subsequent
> INSERTs could fail due to the missing column. But it's not a problem
> as you pointed out since the user dropped the publication.
>

Right, I think we can add a note about this in the document if you
think so but OTOH it appears quite natural to me.

-- 
With Regards,
Amit Kapila.




Re: Add LZ4 compression in pg_dump

2023-04-26 Thread gkokolatos






--- Original Message ---
On Tuesday, April 25th, 2023 at 8:02 AM, Michael Paquier  
wrote:


> 
> 
> On Wed, Apr 12, 2023 at 07:53:53PM -0500, Justin Pryzby wrote:
> 
> > I doubt it - in the !HAVE_LIBZ case, it's currently an "if" statement
> > with nothing but a comment, which isn't a problem.
> > 
> > I think the only issue with an empty "if" is when you have no braces,
> > like:
> > 
> > if (...)
> > #if ...
> > something;
> > #endif
> > 
> > // problem here //
> 
> 
> (My apologies for the late reply.)
> 
> Still it could be easily messed up, and that's not a style that
> really exists in the tree, either, because there are always #else
> blocks set up in such cases. Another part that makes me a bit
> uncomfortable is that we would still call twice
> parse_compress_specification(), something that should not happen but
> we are doing so on HEAD because the default compression_algorithm_str
> is "none" and we want to enforce "gzip" for custom and directory
> formats when building with zlib.
> 
> What about just moving this block a bit up, just before the
> compression spec parsing, then? If we set compression_algorithm_str,
> the specification is compiled with the expected default, once instead
> of twice.

For what is worth, I think this would be the best approach. +1

Cheers,
//Georgios

> --
> Michael




Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-04-26 Thread Masahiko Sawada
Hi,

On Sun, Apr 16, 2023 at 9:09 AM David Rowley  wrote:
>
> On Sat, 15 Apr 2023 at 12:59, David Rowley  wrote:
> > These are all valid points. I've attached a patch aiming to address
> > each of them.
>
> I tweaked this a little further and pushed it.
>

I realized that the value of vacuum_buffer_usage_limit parameter in
postgresql.conf.sample doesn't have the unit:

#vacuum_buffer_usage_limit = 256 # size of vacuum and analyze buffer
access strategy ring.
 # 0 to disable vacuum buffer access strategy
 # range 128kB to 16GB

It works but I think we might want to add the unit kB for
understandability and consistency with other GUC_UNIT_KB parameters.
I've attached a small patch that adds the unit and aligns the indent
of the comments to the perimeter parameters.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


add_unit_to_vacuum_buffer_usage_limit.patch
Description: Binary data


Re: [pg_rewind] use the passing callback instead of global function

2023-04-26 Thread Daniel Gustafsson
> On 26 Apr 2023, at 10:33, Richard Guo  wrote:
> 
> On Wed, Apr 26, 2023 at 9:51 AM Junwang Zhao  > wrote:
> `local_traverse_files` and `libpq_traverse_files` both have a
> callback parameter but instead use the global process_source_file
> which is no good for function encapsulation.
> 
> Nice catch.  This should be a typo introduced by 37d2ff38.

Agreed, I'll look at applying this after some testing.

> While this patch is doing it correctly, I'm wondering that since both
> kinds of source server (libpq and local) are using the same function
> (i.e. process_source_file) to process source file list for
> traverse_files operations, do we really need to provide a callback?  Or
> will there be some kind of source server that may use different source
> file processing function?


While there isn't one right now, removing the callback seems like imposing a
restriction that the refactoring in 37d2ff38 aimed to avoid.

--
Daniel Gustafsson





Re: [pg_rewind] use the passing callback instead of global function

2023-04-26 Thread Richard Guo
On Wed, Apr 26, 2023 at 9:51 AM Junwang Zhao  wrote:

> `local_traverse_files` and `libpq_traverse_files` both have a
> callback parameter but instead use the global process_source_file
> which is no good for function encapsulation.


Nice catch.  This should be a typo introduced by 37d2ff38.

While this patch is doing it correctly, I'm wondering that since both
kinds of source server (libpq and local) are using the same function
(i.e. process_source_file) to process source file list for
traverse_files operations, do we really need to provide a callback?  Or
will there be some kind of source server that may use different source
file processing function?

Thanks
Richard


Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-04-26 Thread Drouvot, Bertrand

Hi,

On 4/26/23 6:06 AM, vignesh C wrote:

On Tue, 25 Apr 2023 at 12:51, Drouvot, Bertrand
 wrote:
Thanks for the updated patch.
Few comments:


Thanks for looking at it!


1) subscriber_stdout and  subscriber_stderr are not required for this
test case, we could remove it, I was able to remove those variables
and run the test successfully:
+$node_subscriber->start;
+
+my %psql_subscriber = (
+   'subscriber_stdin'  => '',
+   'subscriber_stdout' => '',
+   'subscriber_stderr' => '');
+$psql_subscriber{run} = IPC::Run::start(
+   [ 'psql', '-XA', '-f', '-', '-d',
$node_subscriber->connstr('postgres') ],
+   '<',
+   \$psql_subscriber{subscriber_stdin},
+   '>',
+   \$psql_subscriber{subscriber_stdout},
+   '2>',
+   \$psql_subscriber{subscriber_stderr},
+   $psql_timeout);

I ran it like:
my %psql_subscriber = (
'subscriber_stdin' => '');
$psql_subscriber{run} = IPC::Run::start(
[ 'psql', '-XA', '-f', '-', '-d', $node_subscriber->connstr('postgres') ],
'<',
\$psql_subscriber{subscriber_stdin},
$psql_timeout);



Not using the 3 std* is also the case for example in 021_row_visibility.pl and 
032_relfilenode_reuse.pl
where the "stderr" is set but does not seem to be used.

I don't think that's a problem to keep them all and I think it's better to have
them re-directed to dedicated places.


2) Also we have changed the default timeout here, why is this change required:
  my $node_cascading_standby =
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
  my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout= IPC::Run::timer(2 * $default_timeout);


I think I used 021_row_visibility.pl as an example. But agree there is
others .pl that are using the timeout_default as the psql_timeout and that
the default is enough in our case. So, using the default in V5 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 7d06ba14cce372b60e859b2274933b7c1a25b26a Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 24 Apr 2023 05:13:23 +
Subject: [PATCH v5] Add subscribtion to the standby test in
 035_standby_logical_decoding.pl

Adding one test, to verify that subscribtion to the standby is possible.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 11 ++-
 .../t/035_standby_logical_decoding.pl | 91 ++-
 2 files changed, 99 insertions(+), 3 deletions(-)
   7.5% src/test/perl/PostgreSQL/Test/
  92.4% src/test/recovery/t/

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 6f7f4e5de4..819667d42a 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2644,7 +2644,16 @@ sub wait_for_catchup
}
if (!defined($target_lsn))
{
-   $target_lsn = $self->lsn('write');
+   my $isrecovery = $self->safe_psql('postgres', "SELECT 
pg_is_in_recovery()");
+   chomp($isrecovery);
+   if ($isrecovery eq 't')
+   {
+   $target_lsn = $self->lsn('replay');
+   }
+   else
+   {
+   $target_lsn = $self->lsn('write');
+   }
}
print "Waiting for replication conn "
  . $standby_name . "'s "
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index b8f5311fe9..f6d6447412 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -10,12 +10,17 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, 
$handle, $slot);
+my ($stdin, $stdout,$stderr,
+   $cascading_stdout,  $cascading_stderr,  $subscriber_stdin,
+   $subscriber_stdout, $subscriber_stderr, $ret,
+   $handle,$slot);
 
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 my $node_standby = PostgreSQL::Test::Cluster->new('standby');
 my $node_cascading_standby = 
PostgreSQL::Test::Cluster->new('cascading_standby');
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $psql_timeout= IPC::Run::timer($default_timeout);
 my $res;
 
 # Name for the physical slot on primary
@@ -267,7 +272,8 @@ $node_standby->init_from_backup(
has_streaming => 1,
has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
-   qq[primary_slot_name = '$primary_slotname']);
+   qq[primary_slot_name = '$primary_slotname'
+   max_replication_slots = 5]);
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
 $node_standby->safe

Re: run pgindent on a regular basis / scripted manner

2023-04-26 Thread Peter Eisentraut

On 24.04.23 16:14, Tom Lane wrote:

Peter Eisentraut  writes:

Does anyone find perltidy useful?  To me, it functions more like a
JavaScript compiler in that once you process the source code, it is no
longer useful for manual editing.  If we are going to have the buildfarm
check indentation and that is going to be extended to Perl code, I have
some concerns about that.


I certainly don't like its current behavior where adding/changing one
line can have side-effects on nearby lines.  But we have a proposal
to clean that up, and I'm cautiously optimistic that it'll be better
in future.  Did you have other specific concerns?


I think the worst is how it handles multi-line data structures like

$newnode->command_ok(
[
'psql', '-X',
'-v',   'ON_ERROR_STOP=1',
'-c',   $upcmds,
'-d',   $oldnode->connstr($updb),
],
"ran version adaptation commands for database $updb");

or

$node->command_fails_like(
[
'pg_basebackup',   '-D',
"$tempdir/backup", '--compress',
$cft->[0]
],
qr/$cfail/,
'client ' . $cft->[2]);

Perhaps that is included in the upcoming changes you are referring to?




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-04-26 Thread Peter Eisentraut

On 24.04.23 14:03, Hayato Kuroda (Fujitsu) wrote:

so at least there's a good chance that they will still be at
shutdown, and will therefore send all the data to the subscribers? Having a
regression tests for that scenario would also be a good idea.  Having an
uncommitted write transaction should be enough to cover it.


I think background_psql() can be used for the purpose. Before doing pg_upgrade
--check, a transaction is opened and kept. It means that the confirmed_flush has
been not reached to the current WAL position yet, but the checking says OK
because all slots are active.


A suggestion: You could write some/most tests against test_decoding 
rather than the publication/subscription system.  That way, you can 
avoid many timing issues in the tests and you can check more exactly 
that the slots produce the output you want.  This would also help ensure 
that this new facility works for other logical decoding output plugins 
besides the built-in one.






Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-04-26 Thread John Naylor
On Tue, Apr 25, 2023 at 4:58 AM Peter Geoghegan  wrote:
>
> There are also very big structural problems with "Routine Vacuuming",
> that I also propose to do something about. Honestly, it's a huge mess
> at this point. It's nobody's fault in particular; there has been
> accretion after accretion added, over many years. It is time to
> finally bite the bullet and do some serious restructuring. I'm hoping
> that I don't get too much push back on this, because it's already very
> difficult work.

Now is a great time to revise this section, in my view. (I myself am about
ready to get back to testing and writing for the task of removing that
"obnoxious hint".)

> Attached patch series shows what I consider to be a much better
> overall structure. To make this convenient to take a quick look at, I
> also attach a prebuilt version of routine-vacuuming.html (not the only
> page that I've changed, but the most important set of changes by far).
>
> This initial version is still quite lacking in overall polish, but I
> believe that it gets the general structure right. That's what I'd like
> to get feedback on right now: can I get agreement with me about the
> general nature of the problem? Does this high level direction seem
> like the right one?

I believe the high-level direction is sound, and some details have been
discussed before.

> The following list is a summary of the major changes that I propose:
>
> 1. Restructures the order of items to match the actual processing
> order within VACUUM (and ANALYZE), rather than jumping from VACUUM to
> ANALYZE and then back to VACUUM.
>
> This flows a lot better, which helps with later items that deal with
> freezing/wraparound.

Seems logical.

> 2. Renamed "Preventing Transaction ID Wraparound Failures" to
> "Freezing to manage the transaction ID space". Now we talk about
> wraparound as a subtopic of freezing, not vice-versa. (This is a
> complete rewrite, as described by later items in this list).

+1

> 3. All of the stuff about modulo-2^32 arithmetic is moved to the
> storage chapter, where we describe the heap tuple header format.

It does seem to be an excessive level of detail for this chapter, so +1.
Speaking of excessive detail, however...(skipping ahead)

+
+ 
+  There is no fundamental difference between a
+  VACUUM run during anti-wraparound
+  autovacuuming and a VACUUM that happens to
+  use the aggressive strategy (whether run by autovacuum or
+  manually issued).
+ 
+

I don't see the value of this, from the user's perspective, of mentioning
this at all, much less for it to be called out as a Note. Imagine a user
who has been burnt by non-cancellable vacuums. How would they interpret
this statement?

> It seems crazy to me that the second sentence in our discussion of
> wraparound/freezing is still:
>
> "But since transaction IDs have limited size (32 bits) a cluster that
> runs for a long time (more than 4 billion transactions) would suffer
> transaction ID wraparound: the XID counter wraps around to zero, and
> all of a sudden transactions that were in the past appear to be in the
> future"

Hah!

> 4. No more separate section for MultiXactID freezing -- that's
> discussed as part of the discussion of page-level freezing.
>
> Page-level freezing takes place without regard to the trigger
> condition for freezing. So the new approach to freezing has a fixed
> idea of what it means to freeze a given page (what physical
> modifications it entails). This means that having a separate sect3
> subsection for MultiXactIds now makes no sense (if it ever did).

I have no strong opinion on that.

> 5. The top-level list of maintenance tasks has a new addition: "To
> truncate obsolescent transaction status information, when possible".

+1

> 6. Rename the whole "Routine Vacuuming" section to "Autovacuum
> Maintenance Tasks".
>
> This is what we should be emphasizing over manually run VACUUMs.
> Besides, the current title just seems wrong -- we're talking about
> ANALYZE just as much as VACUUM.

Seems more accurate. On top of that, "Routine vacuuming" slightly implies
manual vacuums.

I've only taken a cursory look, but will look more closely as time permits.

(Side note: My personal preference for rough doc patches would be to leave
out spurious whitespace changes. That not only includes indentation, but
also paragraphs where many of the words haven't changed at all, but every
line has changed to keep the paragraph tidy. Seems like more work for both
the author and the reviewer.)

--
John Naylor
EDB: http://www.enterprisedb.com