date:20211118

On Wed, Oct 13, 2021 at 12:50:38PM -0400, Robert Haas wrote:
> I'm not a different vendor, but I do work on different code than you
> do, and I like this. Advanced Server accidentally dodges this problem
> at present by shipping with a different FUNC_MAX_ARGS value, but this
> is much cleaner.

I am pretty sure that Greenplum could benefit from something like
that.  As a whole, using a string looks like a good idea for that.

> Would it be reasonable to consider something similar for the control
> file, for the benefit of distributions that are not the same on disk?

Hmm.  Wouldn't that cause more harm than actual benefits?
--
Michael


signature.asc
Description: PGP signature

Re: wait event and archive_command

On Thu, Nov 18, 2021 at 10:04:57AM +0530, Bharath Rupireddy wrote:
> Yeah let's not do that. I'm fine with the
> wait_event_for_archive_command_v2.patch as is.

Switched the patch as RfC, then.
--
Michael


signature.asc
Description: PGP signature

Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 5:52 PM Amit Kapila  wrote:
> Why that Assert will hit? We seem to be always passing 'create' as
> true so it should create a new entry. I think a similar situation can
> happen for functions and it will be probably cleaned in the next
> vacuum cycle.
>
Oops, I missed that too. So at worst, vacuum will clean it up in the
out-of-order SUBSCRIPTIONPURGE,SUBWORKERERROR case.

But I still think the current code may not correctly handle
first_error_time/last_error_time timestamps if out-of-order
SUBWORKERERROR messages occur, right?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: drop tablespace failed when location contains .. on win32

On Wed, Nov 10, 2021 at 05:43:31PM -0500, Tom Lane wrote:
> Another thing I happened to notice is that join_path_components
> is going out of its way to not generate "foo/./bar", but if
> we are fixing canonicalize_path to be able to delete the "./",
> that seems like a waste of code now.
> 
> I am not entirely convinced that 0002 isn't re-introducing the
> security hole that the existing code seeks to plug.  That one
> is going to require more justification.

At the same time, do we have any need for doing 0002 at all if
we do 0001?  The paths are canonicalized before checking them in
path_contains_parent_reference().

> I concur with the upthread comments that there's little chance
> we'll commit 0003 as-is; the code-to-benefit ratio is too high.
> Instead, you might consider adding test_canonicalize_path in
> src/test/regress/regress.c, and then adding a smaller number
> of regression test cases using that.

Sounds like a good idea to me.  I would move these in misc.source for
anything that require an absolute path.

0001 is indeed in need of more comments and documentation so as one
does not get lost if reading through this code in the future.  Changes
in trim_directory(), for example, should explain what is returned and
why.

+   isabs = is_absolute_path(path);
+   tmppath = strdup(path);
If possible, it would be nice to cut any need for malloc() allocations
in this code.
--
Michael

signature.asc
Description: PGP signature

A spot of redundant initialization of brin memtuple

2021-11-18 Thread Richard Guo

Happened to notice this when reading around the codes. The BrinMemTuple
would be initialized in brin_new_memtuple(), right after being created.
So we don't need to initialize it again outside.

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index ccc9fa0959..67a277e1f9 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1261,8 +1261,6 @@ initialize_brin_buildstate(Relation idxRel,
BrinRevmap *revmap,
state->bs_bdesc = brin_build_desc(idxRel);
state->bs_dtuple = brin_new_memtuple(state->bs_bdesc);

-   brin_memtuple_initialize(state->bs_dtuple, state->bs_bdesc);
-
return state;
 }

Thanks
Richard

Re: CREATE tab completion

On Fri, Nov 19, 2021 at 04:19:01PM +0900, Ken Kato wrote:
> I assume Michael has committed the modified version of the patch.
> Therefore, I changed the status to"committed" in Commitfest 2022-01.
> https://commitfest.postgresql.org/36/3418/

Thanks, I did not notice that :)
--
Michael


signature.asc
Description: PGP signature

Re: Some RELKIND macro refactoring

On Mon, Nov 01, 2021 at 07:28:16AM +0100, Peter Eisentraut wrote:
> 
> Small rebase of this patch set.

Regarding 0001, I find the existing code a bit more self-documenting
if we keep those checks flagInhAttrs() and guessConstraintInheritance().
So I would rather leave these.

I like 0002, which makes things a bit easier to go through across
various places where relkind-only checks are replaced by those new
macros.  There is one thing I bumped into, though..

>  if (create_storage)
>  {
> -switch (rel->rd_rel->relkind)
> -{
> -case RELKIND_VIEW:
> -case RELKIND_COMPOSITE_TYPE:
> -case RELKIND_FOREIGN_TABLE:
> -case RELKIND_PARTITIONED_TABLE:
> -case RELKIND_PARTITIONED_INDEX:
> -Assert(false);
> -break;
> -[ .. deletions .. ]
> -}
> +if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind))
> +table_relation_set_new_filenode(rel, >rd_node,
> +relpersistence,
> +relfrozenxid, relminmxid);
> +else
> +RelationCreateStorage(rel->rd_node, relpersistence);
>  }

I think that you should keep this block as it is now.  The part where
a relkind does not support table AMs and does not require storage
would get uncovered, and this new code would just attempt to create
storage, so it seems to me that the existing code is better when it
comes to prevent mistakes from developers?  You could as well use an
else if (RELKIND_INDEX || RELKIND_SEQUENCE) for
RelationCreateStorage(), and fall back to Assert(false) in the else
branch, to get the same result as the original without impacting the
level of safety of the original code.
--
Michael


signature.asc
Description: PGP signature

Re: CREATE tab completion

2021-11-18 Thread Ken Kato


I have fixed (or discarded) that, and the parts for sequences, domains
and transforms remained.  That looked like good enough on its own, so
applied those parts of the patch.
--
Michael

Thank you very much!

I assume Michael has committed the modified version of the patch.
Therefore, I changed the status to"committed" in Commitfest 2022-01.
https://commitfest.postgresql.org/36/3418/

Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Skipping logical replication transactions on subscriber side

2021-11-18 Thread vignesh C

On Fri, Nov 19, 2021 at 12:22 PM Amit Kapila  wrote:
>
> On Fri, Nov 19, 2021 at 11:09 AM vignesh C  wrote:
> >
> > On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila  wrote:
> > >
> > > On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Nov 18, 2021 at 5:45 PM tanghy.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > Right. I've fixed this issue and attached an updated patch.
> > > > > >
> > > > > >
> > > > >
> > > > > Thanks for your patch.
> > > > >
> > > > > I read the discussion about stats entries for table sync worker[1], 
> > > > > the
> > > > > statistics are retained after table sync worker finished its jobs and 
> > > > > user can remove
> > > > > them via pg_stat_reset_subscription_worker function.
> > > > >
> > > > > But I notice that, if a table sync worker finished its jobs, the 
> > > > > error reported by
> > > > > this worker will not be shown in the pg_stat_subscription_workers 
> > > > > view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") 
> > > > > Is it intentional? I think this may cause a result that users don't 
> > > > > know the statistics are still exist, and won't remove the statistics 
> > > > > manually. And that is not friendly to users' storage, right?
> > > > >
> > > >
> > > > You're right. The condition "WHERE substate <> 'r') should be removed.
> > > > I'll do that change in the next version patch. Thanks!
> > > >
> > >
> > > One more thing you might want to consider for the next version is
> > > whether to rename the columns as discussed in the related thread [1]?
> > > I think we should consider future work and name them accordingly.
> > >
> > > [1] - 
> > > https://www.postgresql.org/message-id/CAA4eK1KR41bRUuPeNBSGv2%2Bq7ROKukS3myeAUqrZMD8MEwR0DQ%40mail.gmail.com
> >
> > Since the statistics collector process uses UDP socket, the sequencing
> > of the messages is not guaranteed. Will there be a problem if
> > Subscription is dropped and stats collector receives
> > PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
> > is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
> > can happen because of UDP socket). I'm not sure if the Assert will be
> > a problem in this case.
> >
>
> Why that Assert will hit? We seem to be always passing 'create' as
> true so it should create a new entry. I think a similar situation can
> happen for functions and it will be probably cleaned in the next
> vacuum cycle.

Since we are passing true that Assert will not hit, sorry I missed to
notice that. It will create a new entry as you rightly pointed out.
Since the cleaning is handled by vacuum and current code is also doing
that way, I felt no need to make any change.

Regards,
Vignesh

Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 11:09 AM vignesh C  wrote:
>
> On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila  wrote:
> >
> > On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Nov 18, 2021 at 5:45 PM tanghy.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > Right. I've fixed this issue and attached an updated patch.
> > > > >
> > > > >
> > > >
> > > > Thanks for your patch.
> > > >
> > > > I read the discussion about stats entries for table sync worker[1], the
> > > > statistics are retained after table sync worker finished its jobs and 
> > > > user can remove
> > > > them via pg_stat_reset_subscription_worker function.
> > > >
> > > > But I notice that, if a table sync worker finished its jobs, the error 
> > > > reported by
> > > > this worker will not be shown in the pg_stat_subscription_workers view. 
> > > > (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it 
> > > > intentional? I think this may cause a result that users don't know the 
> > > > statistics are still exist, and won't remove the statistics manually. 
> > > > And that is not friendly to users' storage, right?
> > > >
> > >
> > > You're right. The condition "WHERE substate <> 'r') should be removed.
> > > I'll do that change in the next version patch. Thanks!
> > >
> >
> > One more thing you might want to consider for the next version is
> > whether to rename the columns as discussed in the related thread [1]?
> > I think we should consider future work and name them accordingly.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CAA4eK1KR41bRUuPeNBSGv2%2Bq7ROKukS3myeAUqrZMD8MEwR0DQ%40mail.gmail.com
>
> Since the statistics collector process uses UDP socket, the sequencing
> of the messages is not guaranteed. Will there be a problem if
> Subscription is dropped and stats collector receives
> PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
> is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
> can happen because of UDP socket). I'm not sure if the Assert will be
> a problem in this case.
>

Why that Assert will hit? We seem to be always passing 'create' as
true so it should create a new entry. I think a similar situation can
happen for functions and it will be probably cleaned in the next
vacuum cycle.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 4:39 PM vignesh C  wrote:
>
> Since the statistics collector process uses UDP socket, the sequencing
> of the messages is not guaranteed. Will there be a problem if
> Subscription is dropped and stats collector receives
> PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
> is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
> can happen because of UDP socket). I'm not sure if the Assert will be
> a problem in this case. If this scenario is possible we could just
> silently return in that case.
>

Given that the message sequencing is not guaranteed, it looks like
that Assert and the current code after it won't handle that scenario
well. Silently returning if subwentry is NULL does seem like the way
to deal with that possibility.
Doesn't this possibility of out-of-sequence messaging due to UDP
similarly mean that "first_error_time" and "last_error_time" may not
be currently handled correctly?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Skipping logical replication transactions on subscriber side

2021-11-18 Thread vignesh C

On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila  wrote:
>
> On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada  wrote:
> >
> > On Thu, Nov 18, 2021 at 5:45 PM tanghy.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada 
> > >  wrote:
> > > >
> > > > Right. I've fixed this issue and attached an updated patch.
> > > >
> > > >
> > >
> > > Thanks for your patch.
> > >
> > > I read the discussion about stats entries for table sync worker[1], the
> > > statistics are retained after table sync worker finished its jobs and 
> > > user can remove
> > > them via pg_stat_reset_subscription_worker function.
> > >
> > > But I notice that, if a table sync worker finished its jobs, the error 
> > > reported by
> > > this worker will not be shown in the pg_stat_subscription_workers view. 
> > > (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it 
> > > intentional? I think this may cause a result that users don't know the 
> > > statistics are still exist, and won't remove the statistics manually. And 
> > > that is not friendly to users' storage, right?
> > >
> >
> > You're right. The condition "WHERE substate <> 'r') should be removed.
> > I'll do that change in the next version patch. Thanks!
> >
>
> One more thing you might want to consider for the next version is
> whether to rename the columns as discussed in the related thread [1]?
> I think we should consider future work and name them accordingly.
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1KR41bRUuPeNBSGv2%2Bq7ROKukS3myeAUqrZMD8MEwR0DQ%40mail.gmail.com

Since the statistics collector process uses UDP socket, the sequencing
of the messages is not guaranteed. Will there be a problem if
Subscription is dropped and stats collector receives
PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
can happen because of UDP socket). I'm not sure if the Assert will be
a problem in this case. If this scenario is possible we could just
silently return in that case.

+static void
+pgstat_recv_subworker_error(PgStat_MsgSubWorkerError *msg, int len)
+{
+   PgStat_StatDBEntry *dbentry;
+   PgStat_StatSubWorkerEntry *subwentry;
+
+   dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+   /* Get the subscription worker stats */
+   subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
msg->m_subrelid, true);
+   Assert(subwentry);
+
+   /*
+* Update only the counter and last error timestamp if we received
+* the same error again
+*/

Thoughts?

Regards,
Vignesh

Re: pg_get_publication_tables() output duplicate relid

On Fri, Nov 19, 2021 at 7:19 AM Amit Langote  wrote:
>
> On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila  wrote:
> >
> > AFAICU, there are actually two problems related to
> > pg_publication_tables view that are being discussed: (a) when
> > 'publish_via_partition_root' is true then it returns both parent and
> > child tables (provided both are part of publication), this can lead to
> > duplicate data on the subscriber; the fix for this is being discussed
> > in thread [1]. (b) when 'publish_via_partition_root' is false, then it
> > returns duplicate entries for child tables (provided both are part of
> > publication), this problem is being discussed in this thread.
> >
> > As per the proposed patches, it seems both need a separate fix, we can
> > fix them together if we want but I am not sure. Is your understanding
> > the same?
>
> Actually, I'm seeing both (a) and (b) as more or less the same thing.
> If publish_via_partition_root=true, we'd want to remove a partition
> from the list if the root parent is also present.  If false, we'd want
> to remove a partition from the list because it'd also be added by way
> of expanding the root.
>
> I know that (a) causes the partition-double-sync problem that could be
> fixed by partition OID de-duplication as proposed in the other thread.
> As a comment on that, it'd be nice to see a test case added to
> src/test/subscription suite in that patch, because
> partition-double-sync is the main problem statement I'd think.
>
> Like Bharath said, I don't see (b) as much of a problem, because the
> subscriber applies DISTINCT when querying pg_publication_tables()
> anyway.  Other users, if any out there, may be doing the same by
> following core subscription code's example.
>

Hmm, I don't think we can assume that current or future users of
pg_publication_tables() will use it with DISTINCT unless we specify in
the specs/docs. However, I guess we might want to do it only for HEAD
as there is no direct field report for this and anyway we have some
workaround for this.

>  That said, I am not
> opposed to applying the patch being proposed here, though I guess we
> don't have any src/test/subscription test case for this one?
>

Yeah, we can add tests for this and the other case.

>  As in,
> do we know of any replication (initial/streaming) misbehavior caused
> by the duplicate partition OIDs in this case or is the only problem
> that pg_publication_tables output looks odd?
>

The latter one but I think either we should document this or change it
as we can't assume users will follow what subscriber-side code does.

> > > unless
> > > we know of a bigger problem that requires us to hack the subscriber
> > > side of things too.
> > >
> >
> > There is yet another issue that might need subscriber side change. See
> > the second issue summarized by Hou-San in the email[2]. I feel it is
> > better to tackle that separately.
> >
>
> The problematic case is attaching the partition *after* the subscriber
> has already marked the root parent as synced and/or ready for
> replication.  Refreshing the subscription doesn't help it discover the
> newly attached partition, because a publish_via_partition_root only
> ever tells about the root parent, which would be already synced, so
> the subscriber would think there's nothing to copy.
>

Okay, I see this could be a problem but I haven't tried to reproduce it.

> > Anyway, if this is a problem
> > we need to figure the solution for this separately.
>
> Sure, we might need to do that after all.  Though it might be a good
> idea to be sure that we won't need to reconsider the fix we push for
> the issue(s) being discussed here and elsewhere, because I suspect
> that the solution to the problem I mentioned is likely to involve
> tweaking pg_publication_tables view output.
>

Okay, so we have four known problems in the same area. The first has
been reported at the start of this thread, then the two as summarized
by Hou-San in his email [1] and the fourth one is the attach of
partitions after initial sync as you mentioned. It makes sense to have
some high-level idea on how to solve each of the problems before
pushing a fix for any one particular problem as one fix could have an
impact on other fixes. I'll think over it but please do let me know if
you have any ideas.


[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2021-11-18 Thread Peter Smith

On Fri, Nov 19, 2021 at 4:15 PM Greg Nancarrow  wrote:
>
> On Thu, Nov 18, 2021 at 12:33 PM Peter Smith  wrote:
> >
> > PSA new set of v40* patches.
> >
>
> I notice that in the 0001 patch, it adds a "relid" member to the
> PublicationRelInfo struct:
>
> src/include/catalog/pg_publication.h
>
>  typedef struct PublicationRelInfo
>  {
> +  Oid relid;
> Relation relation;
> +  Node *whereClause;
>  } PublicationRelInfo;
>
> It appears that this new member is not actually required, as the relid
> can be simply obtained from the existing "relation" member - using the
> RelationGetRelid() macro.
>

+1

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

On Thu, Nov 18, 2021 at 12:33 PM Peter Smith  wrote:
>
> PSA new set of v40* patches.
>

I notice that in the 0001 patch, it adds a "relid" member to the
PublicationRelInfo struct:

src/include/catalog/pg_publication.h

 typedef struct PublicationRelInfo
 {
+  Oid relid;
Relation relation;
+  Node *whereClause;
 } PublicationRelInfo;

It appears that this new member is not actually required, as the relid
can be simply obtained from the existing "relation" member - using the
RelationGetRelid() macro.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?

On Thu, Nov 18, 2021 at 5:01 PM Bharath Rupireddy
 wrote:
> The following is what I made up in my mind after looking at other
> existing messages, like [1] and the review comments:
> errmsg("cannot send signal to postmaster %d", pid,   --> the process
> is postmaster but the caller isn't allowed to signal.
> errmsg("cannot send signal to PostgreSQL server process %d", pid,
> --> the process is a postgresql process but the caller isn't allowed
> to signal.
> errmsg("PID %d is not a PostgreSQL backend process", pid,  ---> it may
> be another postgres processes like syslogger or stats collector or
> non-postgres process but not a backend process.
>
> Thoughts?
>
> [1]
> (errmsg("could not send signal to process %d: %m", pid)));
> (errmsg("failed to send signal to postmaster: %m")));

Here's the v4 patch with the above changes, the output looks like [1].
Please review it further.

[1]
postgres=# select pg_terminate_backend(2407245);
WARNING:  cannot send signal to postmaster 2407245
 pg_terminate_backend
--
 f
(1 row)

postgres=# select pg_terminate_backend(2407246);
WARNING:  cannot send signal to PostgreSQL server process 2407246
 pg_terminate_backend
--
 f
(1 row)

postgres=# select pg_terminate_backend(2407286);
WARNING:  PID 2407286 is not a PostgreSQL backend process
 pg_terminate_backend
--
 f
(1 row)

Regards,
Bharath Rupireddy.
From 24800ffa999f527ac9c526ef11e6679d18269d39 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 19 Nov 2021 02:09:49 +
Subject: [PATCH v4] Improve PID  is not a PostgreSQL server process
 message

Currently, pg_signal_backend emits generic WARNING "PID  is
not a PostgreSQL server process" for postmaster and auxiliary
processes which doesn't sound sensible. This patch tries to
improve this message.
---
 src/backend/storage/ipc/signalfuncs.c | 38 +++
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index de69d60e79..8d0f0fea65 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -48,7 +48,17 @@
 static int
 pg_signal_backend(int pid, int sig)
 {
-	PGPROC	   *proc = BackendPidGetProc(pid);
+	PGPROC	   *proc;
+
+	if (PostmasterPid == pid)
+	{
+		ereport(WARNING,
+(errmsg("cannot send signal to postmaster %d", pid)));
+
+		return SIGNAL_BACKEND_ERROR;
+	}
+
+	proc = BackendPidGetProc(pid);
 
 	/*
 	 * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
@@ -61,11 +71,29 @@ pg_signal_backend(int pid, int sig)
 	if (proc == NULL)
 	{
 		/*
-		 * This is just a warning so a loop-through-resultset will not abort
-		 * if one backend terminated on its own during the run.
+		 * AuxiliaryProcs are still PostgreSQL server processes, do a little
+		 * more work and report a proper warning that says we cannot signal
+		 * them.
+		 *
+		 * For an auxiliary process, retrieve process info from AuxiliaryProcs
+		 * stored in shared-memory.
 		 */
-		ereport(WARNING,
-(errmsg("PID %d is not a PostgreSQL server process", pid)));
+		proc = AuxiliaryPidGetProc(pid);
+
+		if (proc)
+			ereport(WARNING,
+	(errmsg("cannot send signal to PostgreSQL server process %d",
+			pid)));
+		else
+		{
+			/*
+			 * This is just a warning so a loop-through-resultset will not
+			 * abort if one backend terminated on its own during the run.
+			 */
+			ereport(WARNING,
+	(errmsg("PID %d is not a PostgreSQL backend process", pid)));
+		}
+
 		return SIGNAL_BACKEND_ERROR;
 	}
 
-- 
2.25.1

Re: CREATE tab completion

On Thu, Nov 18, 2021 at 05:20:58PM +0900, Kyotaro Horiguchi wrote:
> So we could use COMPLETE_WITH_CS instead so that CREATE SEQUENCE
> behavess the same way with the existing behavior of TYPE.

Makes sense.

Another issue I have noticed with the patch is that it forgets to
apply quotes for the FROM/TO clauses of CREATE CONVERSION, making the
queries fail.  \encoding does not care about that but we do for this
set of DDLs.  So I have discarded this part.

One extra issue was in CREATE TRANSFORM, where we should handle "OR
REPLACE" like the others.  And a last issue I had was with CREATE
LANGUAGE where we would miss TRUSTED.  This last one cannot be
completed, but we'd better allow its completion if we can.  That makes
things a bit tense if you add on top of that the handling of "OR
REPLACE".

I have fixed (or discarded) that, and the parts for sequences, domains
and transforms remained.  That looked like good enough on its own, so
applied those parts of the patch.
--
Michael

signature.asc
Description: PGP signature

Re: row filtering for logical replication

On Fri, Nov 19, 2021 at 5:35 AM Peter Smith  wrote:
>
> On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila  wrote:
> >
> ...
> >
> > Few comments on the latest set of patches (v39*)
> > ===
> > 0001*
> > 1.
> >  ObjectAddress
> > -publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
> > +publication_add_relation(Oid pubid, PublicationRelInfo *pri,
> >   bool if_not_exists)
> >  {
> >   Relation rel;
> >   HeapTuple tup;
> >   Datum values[Natts_pg_publication_rel];
> >   bool nulls[Natts_pg_publication_rel];
> > - Oid relid = RelationGetRelid(targetrel->relation);
> > + Relationtargetrel = pri->relation;
> >
> > I don't think such a renaming (targetrel-->pri) is warranted for this
> > patch. If we really want something like this, we can probably do it in
> > a separate patch but I suggest we can do that as a separate patch.
> >
>
> The name "targetrel" implies it is a Relation. (and historically, this
> arg once was "Relation *targetrel").
>
> Then when the PublicationRelInfo struct was introduced the arg name
> was not changed and it became "PublicationRelInfo *targetrel". But at
> that time PublicationRelInfo was just a simple wrapper for a Relation
> so that was probably ok.
>
> But now this Row-Filter patch has added more new members to
> PublicationRelInfo, so IMO the name change is helpful otherwise it
> seems misleading to continue calling it like it was still just a
> Relation.
>

Okay, that sounds reasonable.


-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

On Fri, Nov 19, 2021 at 3:16 AM Peter Smith  wrote:
>
> On Thu, Nov 18, 2021 at 4:32 PM Peter Smith  wrote:
> >
> > On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila  wrote:
> > >
> > > On Wed, Nov 10, 2021 at 12:36 PM Peter Smith  
> > > wrote:
> > > >
> > > > On Mon, Nov 8, 2021 at 5:53 PM houzj.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > 3) v37-0005
> > > > >
> > > > > - no parse nodes of any kind other than Var, OpExpr, Const, BoolExpr, 
> > > > > FuncExpr
> > > > >
> > > > > I think there could be other node type which can also be considered 
> > > > > as simple
> > > > > expression, for exmaple T_NullIfExpr.
> > > >
> > > > The current walker restrictions are from a previously agreed decision
> > > > by Amit/Tomas [1] and from an earlier suggestion from Andres [2] to
> > > > keep everything very simple for a first version.
> > > >
> > > > Yes, you are right, there might be some additional node types that
> > > > might be fine, but at this time I don't want to add anything different
> > > > without getting their approval to do so. Anyway, additions like this
> > > > are all candidates for a future version of this row-filter feature.
> > > >
> > >
> > > I think we can consider T_NullIfExpr unless you see any problem with the 
> > > same.
> >
> > Added in v40 [1]
> >
>
> I've noticed that row-filters that are testing NULL cannot pass the
> current expression validation restrictions.
>
> e.g.1
> test_pub=# create publication ptest for table t1 where (a is null);
> ERROR:  invalid publication WHERE expression for relation "t1"
> HINT:  only simple expressions using columns, constants and immutable
> system functions are allowed
>
> e.g.2
> test_pub=# create publication ptest for table t1 where (a is not null);
> ERROR:  invalid publication WHERE expression for relation "t1"
> HINT:  only simple expressions using columns, constants and immutable
> system functions are allowed
>
> So I think it would be useful to permit the NullTest also. Is it OK?
>

Yeah, I think such simple expressions should be okay but we need to
test left-side expressions for simplicity.

-- 
With Regards,
Amit Kapila.

add missing errdetail for xlogreader allocation failure error

Hi,

It seems like some of the XLogReaderAllocate failure check errors are
not having errdetail "Failed while allocating a WAL reading
processor." but just the errmsg "out of memory". The "out of memory"
message without the errdetail is too generic and let's add it for
consistency and readability of the message in the server logs.

Here's a tiny patch. Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-add-missing-errdetail-for-xlogreader-allocation-f.patch
Description: Binary data

Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada  wrote:
>
> On Thu, Nov 18, 2021 at 5:45 PM tanghy.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada 
> >  wrote:
> > >
> > > Right. I've fixed this issue and attached an updated patch.
> > >
> > >
> >
> > Thanks for your patch.
> >
> > I read the discussion about stats entries for table sync worker[1], the
> > statistics are retained after table sync worker finished its jobs and user 
> > can remove
> > them via pg_stat_reset_subscription_worker function.
> >
> > But I notice that, if a table sync worker finished its jobs, the error 
> > reported by
> > this worker will not be shown in the pg_stat_subscription_workers view. (It 
> > seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it 
> > intentional? I think this may cause a result that users don't know the 
> > statistics are still exist, and won't remove the statistics manually. And 
> > that is not friendly to users' storage, right?
> >
>
> You're right. The condition "WHERE substate <> 'r') should be removed.
> I'll do that change in the next version patch. Thanks!
>

One more thing you might want to consider for the next version is
whether to rename the columns as discussed in the related thread [1]?
I think we should consider future work and name them accordingly.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KR41bRUuPeNBSGv2%2Bq7ROKukS3myeAUqrZMD8MEwR0DQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

RE: parallel vacuum comments

2021-11-18 Thread houzj.f...@fujitsu.com

On Tues, Nov 16, 2021 1:53 PM Masahiko Sawada  wrote:
> I've incorporated these comments and attached an updated patch.

Thanks for updating the patch.
I read the latest patch and have few comments.

1)
+/*
+ * lazy_vacuum_one_index() -- vacuum index relation.
...
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,


+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
...
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)

The above function names seem different from the name mentioned in the function
header.

2)
 static void vacuum_error_callback(void *arg);

I noticed the patch changed the parallel worker's error callback function to
parallel_index_vacuum_error_callback(). The error message in new callback
function seems a little different from the old one, was it intentional ?


3)

+   /*
+* Reset all index status back to invalid (while checking that we have
+* processed all indexes).
+*/
+   for (int i = 0; i < pvs->nindexes; i++)
+   {
+   PVIndStats *stats = &(pvs->indstats[i]);
+
+   Assert(stats->status == INDVAC_STATUS_COMPLETED);
+   stats->status = INDVAC_STATUS_INITIAL;
+   }

Would it be safer if we report an error if any index's status is not
INDVAC_STATUS_COMPLETED ?

4)

Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:

end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanup

So that all the parallel related functions' name is like parallel_vacuum_xxx.

Best regards,
Hou zj

Re: Teach pg_receivewal to use lz4 compression

On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> In dir_open_for_write() I observe that we are writing the header
> and then calling LZ4F_compressEnd() in case there is an error
> while writing the buffer to the file, and the output buffer of
> LZ4F_compressEnd() is not written anywhere. Why should this be
> necessary? To flush the contents of the internal buffer? But, then we
> are calling LZ4F_freeCompressionContext() immediately after the
> LZ4F_compressEnd() call. I might be missing something, will be
> happy to get more insights.

My concern here was symmetry, where IMO it makes sense to have a
compressEnd call each time there is a successful compressBegin call
done for the LZ4 state data, as there is no way to know if in the
future LZ4 won't change some of its internals to do more than just an
internal flush.
--
Michael


signature.asc
Description: PGP signature

Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 16, 2021 at 5:31 PM Masahiko Sawada  wrote:
>
> Right. I've fixed this issue and attached an updated patch.
>

A couple of comments for the v23 patch:

doc/src/sgml/monitoring.sgml
(1) inconsistent decription
I think that the following description seems inconsistent with the
previous description given above it in the patch (i.e. "One row per
subscription worker, showing statistics about errors that occurred on
that subscription worker"):

"The pg_stat_subscription_workers view will
contain one row per subscription error reported by workers applying
logical replication changes and workers handling the initial data copy
of the subscribed tables."

I think it is inconsistent because it implies there could be multiple
subscription error rows for the same worker.
Maybe the following wording could be used instead, or something similar:

"The pg_stat_subscription_workers view will
contain one row per subscription worker on which errors have occurred,
for workers applying logical replication changes and workers handling
the initial data copy of the subscribed tables."

(2) null vs NULL
The "subrelid" column description uses "null" but the "command" column
description uses "NULL".
I think "NULL" should be used for consistency.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: pg_get_publication_tables() output duplicate relid

2021-11-18 Thread Amit Langote

On Thu, Nov 18, 2021 at 1:53 PM Amit Kapila  wrote:
> On Wed, Nov 17, 2021 at 11:39 AM Amit Langote  wrote:
> > What IS problematic is what a subscriber sees in the
> > pg_publication_tables view and the problem occurs only in the initial
> > sync phase, where the partition is synced duplicatively because of
> > being found in the view along with the parent in this case, that is,
> > when publish_via_partiiton_root is true.  I was saying on the other
> > thread [1] that we should leave it up to the subscriber to decide what
> > to do when the view (duplicatively) returns both the parent and the
> > partition, citing the use case that a subscriber may want to replicate
> > the parent and the partition as independent tables.  Though I now tend
> > to agree with Amit K that that may be such a meaningful and all that
> > common use case, and the implementation on the subscriber side would
> > have to be unnecessarily complex.
> >
> > So maybe it makes sense to just do what has been proposed --
> > de-duplicate partitions out of the pg_publication_tables view,
>
> AFAICU, there are actually two problems related to
> pg_publication_tables view that are being discussed: (a) when
> 'publish_via_partition_root' is true then it returns both parent and
> child tables (provided both are part of publication), this can lead to
> duplicate data on the subscriber; the fix for this is being discussed
> in thread [1]. (b) when 'publish_via_partition_root' is false, then it
> returns duplicate entries for child tables (provided both are part of
> publication), this problem is being discussed in this thread.
>
> As per the proposed patches, it seems both need a separate fix, we can
> fix them together if we want but I am not sure. Is your understanding
> the same?

Actually, I'm seeing both (a) and (b) as more or less the same thing.
If publish_via_partition_root=true, we'd want to remove a partition
from the list if the root parent is also present.  If false, we'd want
to remove a partition from the list because it'd also be added by way
of expanding the root.

I know that (a) causes the partition-double-sync problem that could be
fixed by partition OID de-duplication as proposed in the other thread.
As a comment on that, it'd be nice to see a test case added to
src/test/subscription suite in that patch, because
partition-double-sync is the main problem statement I'd think.

Like Bharath said, I don't see (b) as much of a problem, because the
subscriber applies DISTINCT when querying pg_publication_tables()
anyway.  Other users, if any out there, may be doing the same by
following core subscription code's example.  That said, I am not
opposed to applying the patch being proposed here, though I guess we
don't have any src/test/subscription test case for this one?  As in,
do we know of any replication (initial/streaming) misbehavior caused
by the duplicate partition OIDs in this case or is the only problem
that pg_publication_tables output looks odd?

> > unless
> > we know of a bigger problem that requires us to hack the subscriber
> > side of things too.
> >
>
> There is yet another issue that might need subscriber side change. See
> the second issue summarized by Hou-San in the email[2]. I feel it is
> better to tackle that separately.
>
> >  Actually, I came to know of one such problem
> > while thinking about this today: when you ATTACH a partition to a
> > table that is present in a publish_via_partition_root=true
> > publication, it doesn't get copied via the initial sync even though
> > subsequent replication works just fine.  The reason for that is that
> > the subscriber only syncs the partitions that are known at the time
> > when the parent is first synced, that too via the parent (as SELECT
> >  FROM parent), and then marks the parent as sync-done.
> > Refreshing the subscription after ATTACHing doesn't help, because the
> > subscriber can't see any partitions to begin with.
> >
>
> Sorry, it is not clear to me how this can lead to a problem. Even if
> we just have the root table in the subscriber at the time of sync
> later shouldn't copy get the newly attached partition's data and
> replicate it to the required partition for
> "publish_via_partition_root=true" case?

Not sure if you thought otherwise but I am talking about attaching a
new partition into the partition tree on the *publication side*.

The problematic case is attaching the partition *after* the subscriber
has already marked the root parent as synced and/or ready for
replication.  Refreshing the subscription doesn't help it discover the
newly attached partition, because a publish_via_partition_root only
ever tells about the root parent, which would be already synced, so
the subscriber would think there's nothing to copy.

> Anyway, if this is a problem
> we need to figure the solution for this separately.

Sure, we might need to do that after all.  Though it might be a good
idea to be sure that we won't need to reconsider the fix we push for
the

Re: row filtering for logical replication

2021-11-18 Thread Peter Smith

On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila  wrote:
>
...
>
> Few comments on the latest set of patches (v39*)
> ===
> 0001*
> 1.
>  ObjectAddress
> -publication_add_relation(Oid pubid, PublicationRelInfo *targetrel,
> +publication_add_relation(Oid pubid, PublicationRelInfo *pri,
>   bool if_not_exists)
>  {
>   Relation rel;
>   HeapTuple tup;
>   Datum values[Natts_pg_publication_rel];
>   bool nulls[Natts_pg_publication_rel];
> - Oid relid = RelationGetRelid(targetrel->relation);
> + Relationtargetrel = pri->relation;
>
> I don't think such a renaming (targetrel-->pri) is warranted for this
> patch. If we really want something like this, we can probably do it in
> a separate patch but I suggest we can do that as a separate patch.
>

The name "targetrel" implies it is a Relation. (and historically, this
arg once was "Relation *targetrel").

Then when the PublicationRelInfo struct was introduced the arg name
was not changed and it became "PublicationRelInfo *targetrel". But at
that time PublicationRelInfo was just a simple wrapper for a Relation
so that was probably ok.

But now this Row-Filter patch has added more new members to
PublicationRelInfo, so IMO the name change is helpful otherwise it
seems misleading to continue calling it like it was still just a
Relation.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: pg_upgrade parallelism

2021-11-18 Thread Bruce Momjian

On Wed, Nov 17, 2021 at 02:44:52PM -0500, Jaime Casanova wrote:
> Hi,
> 
> Currently docs about pg_upgrade says:
> 
> """
> 
>  The --jobs option allows multiple CPU cores to be used
>  for copying/linking of files and to dump and reload database schemas
>  in parallel;  a good place to start is the maximum of the number of
>  CPU cores and tablespaces.  This option can dramatically reduce the
>  time to upgrade a multi-database server running on a multiprocessor
>  machine.
> 
> """
> 
> Which make the user think that the --jobs option could use all CPU
> cores. Which is not true. Or that it has anything to do with multiple
> databases, which is true only to some extent.

Uh, the behavior is a little more complicated.  The --jobs option in
pg_upgrade is used to parallelize three operations:

*  copying relation files

*  dumping old cluster objects (via parallel_exec_prog())

*  creating objects in the new cluster (via parallel_exec_prog())

The last two basically operate on databases in parallel --- they can't
dump/load a single database in parallel, but they can dump/load several
databases in parallel.

The documentation you quote above is saying that you set jobs based on
the number of CPUs (for dump/reload which are assumed to be CPU bound)
and the number of tablespaces (which is assumed to be I/O bound).

I am not sure how we can improve that text.  We could just say the max
of the number of databases and tablespaces, but then the number of CPUs
needs to be involved since, if you only have one CPU core, you don't
want parallel dumps/loads happening since that will just cause CPU
contention with little benefit.  We mention tablespaces because even if
you only have once CPU core, since tablespace copying is I/O bound, you
can still benefit from --jobs.

> What that option really improves are upgrading servers with multiple
> tablespaces, of course if --link or --clone are used pg_upgrade is still
> very fast but used with the --copy option is not what one could expect.
> 
> As an example, a customer with a 25Tb database, 40 cores and lots of ram
> used --jobs=35 and got only 7 processes (they have 6 tablespaces) and
> the disks where not used at maximum speed either. They expected 35
> processes copying lots of files at the same time.
> 
> So, first I would like to improve documentation. What about something
> like the attached? 
> 
> Now, a couple of questions:
> 
> - in src/bin/pg_upgrade/file.c at copyFile() we define a buffer to
>   determine the amount of bytes that should be used in read()/write() to
>   copy the relfilenode segments. And we define it as (50 * BLCKSZ),
>   which is 400Kb. Isn't this too small?

Uh, if you find that increasing that helps, we can increase it --- I
don't know how that value was chosen.  However, we are really just
copying the data into the kernel, not forcing it to storage, so I don't
know if a larger value would help.

> - why we read()/write() at all? is not a faster way of copying the file?
>   i'm asking that because i don't actually know.

Uh, we could use buffered I/O, I guess, but again, would there be a
benefit?

> I'm trying to add more parallelism by copying individual segments
> of a relfilenode in different processes. Does anyone one see a big
> problem in trying to do that? I'm asking because no one did it before,
> that could not be a good sign.

I think we were assuming the copy would be I/O bound and that
parallelism wouldn't help in a single tablespace.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

2021-11-18 Thread Melanie Plageman

On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada  wrote:
>
> On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela  wrote:
> >
> > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada 
> >  escreveu:
>
> > The presentation seems a little confusing, wouldn't it be better?
> >
> > I/O Timings: shared/local read= write=xxx temp read=0.487 write=2.073
>
> Yeah, it looks better to add "shared/local".

 Using the patch, I do feel like the EXPLAIN format of

 shared/local xxx, temp xxx

 is a bit confusing. If temp is going to be its own EXPLAIN IO timing
 category (as opposed to being summed with relation data block IO from
 local and shared buffers), then it seems like local and shared should
 be separated as well.

 shared xxx, local xxx, temp xxx

 With your patch applied, below is the top of the EXPLAIN output for a
 query joining a temporary table (so using local buffers) to a regular
 table (using shared buffers) and spilling the hash join (temp files).

Aggregate (actual rows=1 loops=1)
  Buffers: shared read=4425, local read=4425 dirtied=4425
written=4423, temp read=5963 written=5963
  I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198

 I found that using the same terminology as the "EXPLAIN BUFFERS" output
 but not using the same categories was kind of confusing.

 If it is only meaningful to distinguish between relation data IO and
 query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
 should be "rel data" and "temp" or something like that.

- Melanie

Re: LogwrtResult contended spinlock

2021-11-18 Thread Alvaro Herrera

Here's a further attempt at this.  Sorry it took so long.
In this version, I replaced the coupled-in-a-struct representation of
Write with two separate global variables.  The reason to do this
is to cater to Andres' idea to keep them up-to-date separately.  Of
course, I could kept them together, but it seems more sensible this way.

Andres also suggested to not use monotonic-advance in XLogWrite, because
these update are always done with WALWriteLock held in exclusive mode.
However, that doesn't work (and tests fail pretty quickly) because
XLogWrite seems to be called possibly out of order, so you still have to
check if the current value is newer than what you have to write anyway;
using stock atomic write doesn't work.

So this is v4.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1616448368..ad74862cb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -382,16 +382,14 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
  *
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
- * LogwrtResult indicates the byte positions we have already written/fsynced.
+ * LogWriteResult, LogFlushResult indicate the byte positions we have already
+ * written/fsynced.
  * These structs are identical but are declared separately to indicate their
  * slightly different functions.
  *
- * To read XLogCtl->LogwrtResult, you must hold either info_lck or
- * WALWriteLock.  To update it, you need to hold both locks.  The point of
- * this arrangement is that the value can be examined by code that already
- * holds WALWriteLock without needing to grab info_lck as well.  In addition
- * to the shared variable, each backend has a private copy of LogwrtResult,
- * which is updated when convenient.
+ * XLogCtl->LogwrtResult is read and written using atomic operations.
+ * In addition to the shared variable, each backend has a private copy of
+ * LogwrtResult, each member of which is separately updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
  * (protected by info_lck), but we don't need to cache any copies of it.
@@ -414,18 +412,18 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
  *--
  */
 
+typedef struct XLogwrtAtomic
+{
+	pg_atomic_uint64 Write;		/* last byte + 1 of write position */
+	pg_atomic_uint64 Flush;		/* last byte + 1 of flush position */
+} XLogwrtAtomic;
+
 typedef struct XLogwrtRqst
 {
 	XLogRecPtr	Write;			/* last byte + 1 to write out */
 	XLogRecPtr	Flush;			/* last byte + 1 to flush */
 } XLogwrtRqst;
 
-typedef struct XLogwrtResult
-{
-	XLogRecPtr	Write;			/* last byte + 1 written out */
-	XLogRecPtr	Flush;			/* last byte + 1 flushed */
-} XLogwrtResult;
-
 /*
  * Inserting to WAL is protected by a small fixed number of WAL insertion
  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
@@ -577,6 +575,7 @@ typedef struct XLogCtlData
 {
 	XLogCtlInsert Insert;
 
+	XLogwrtAtomic LogwrtResult; /* uses atomics */
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
@@ -594,12 +593,6 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogwrtResult LogwrtResult;
-
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
 	 *
@@ -770,10 +763,11 @@ static ControlFileData *ControlFile = NULL;
 static int	UsableBytesInSegment;
 
 /*
- * Private, possibly out-of-date copy of shared LogwrtResult.
+ * Private, possibly out-of-date copy of shared XLogCtl->LogwrtResult.
  * See discussion above.
  */
-static XLogwrtResult LogwrtResult = {0, 0};
+static XLogRecPtr LogWriteResult = 0;
+static XLogRecPtr LogFlushResult = 0;
 
 /*
  * Codes indicating where we got a WAL file from during recovery, or where
@@ -1201,8 +1195,6 @@ XLogInsertRecord(XLogRecData *rdata,
 		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
 			XLogCtl->LogwrtRqst.Write = EndPos;
-		/* update local result copy while I have the chance */
-		LogwrtResult = XLogCtl->LogwrtResult;
 		SpinLockRelease(>info_lck);
 	}
 
@@ -2174,6 +2166,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 	 * Now that we have the lock, check if someone initialized the page
 	 * already.
 	 */
+	LogWriteResult = pg_atomic_read_u64(>LogwrtResult.Write);
 	while (upto >= XLogCtl->InitializedUpTo || opportunistic)
 	{
 		nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
@@ -2184,7 +2177,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 		 * already written out.
 		 */

Re: Mixing CC and a different CLANG seems like a bad idea

Andres Freund  writes:
> On 2021-11-18 13:39:04 -0500, Tom Lane wrote:
>> More, I think (though this ought to be documented in a comment) that
>> the policy is to not bother turning on extra -W options in the bitcode
>> switches, on the grounds that warning once in the main build is enough.
>> I follow that idea --- but what we missed is that we still need to
>> turn *off* the warnings we're actively disabling.  I shall go do that,
>> if no objections.

> Thanks for doing that, that does sounds like a good way, at least for now.

Cool, thanks for confirming.

For the archives' sake: I thought originally that this was triggered
by having CC different from CLANG, and even wrote that in the commit
message; but I was mistaken.  I was misled by the fact that sidewinder
is the only animal still reporting the compound-token-split-by-macro
warnings, and jumped to the conclusion that its unusual configuration
was the cause.  But actually that't not it, because the flags we
feed to CLANG are *not* dependent on what CC will take.  I now think
the actual uniqueness is that sidewinder is the only animal that is
using clang >= 12 and has --with-llvm enabled.

>> (BTW, does meson have any comparable optimization?
>> If it doesn't, I bet that is going to be a problem.)

> Yes - imo in a nicer, more reliable way. It caches most test results, but with
> the complete input, including the commandline (i.e. compiler flags) as the
> cache key. So no more errors about compile flags changing...

Nice!

regards, tom lane

Re: xlog.c: removing ReadRecPtr and EndRecPtr

Robert Haas  writes:
> On Thu, Nov 18, 2021 at 3:14 PM Tom Lane  wrote:
>> I'm a little dubious that this test case is valuable enough to
>> mess around with a nonstandard postmaster startup protocol, though.

> The problem that I have with the present situation is that the test
> coverage of xlog.c is pretty abysmal.

Agreed, but this one test case isn't going to move the needle much.
To get to reasonable coverage we're going to need more tests, and
I definitely don't want multiple versions of ad-hoc postmaster startup
code.  If we need that, it'd be smarter to extend Cluster.pm so that
the mainline code could do what's needful.

Having said that, it wasn't entirely clear to me why you needed a
weird startup method.  Why couldn't you drop the bogus history file
into place *before* starting the charlie postmaster?  If you have
to do it after, aren't there race/timing problems anyway?

regards, tom lane

Re: row filtering for logical replication

2021-11-18 Thread Peter Smith

On Thu, Nov 18, 2021 at 4:32 PM Peter Smith  wrote:
>
> On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila  wrote:
> >
> > On Wed, Nov 10, 2021 at 12:36 PM Peter Smith  wrote:
> > >
> > > On Mon, Nov 8, 2021 at 5:53 PM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > 3) v37-0005
> > > >
> > > > - no parse nodes of any kind other than Var, OpExpr, Const, BoolExpr, 
> > > > FuncExpr
> > > >
> > > > I think there could be other node type which can also be considered as 
> > > > simple
> > > > expression, for exmaple T_NullIfExpr.
> > >
> > > The current walker restrictions are from a previously agreed decision
> > > by Amit/Tomas [1] and from an earlier suggestion from Andres [2] to
> > > keep everything very simple for a first version.
> > >
> > > Yes, you are right, there might be some additional node types that
> > > might be fine, but at this time I don't want to add anything different
> > > without getting their approval to do so. Anyway, additions like this
> > > are all candidates for a future version of this row-filter feature.
> > >
> >
> > I think we can consider T_NullIfExpr unless you see any problem with the 
> > same.
>
> Added in v40 [1]
>

I've noticed that row-filters that are testing NULL cannot pass the
current expression validation restrictions.

e.g.1
test_pub=# create publication ptest for table t1 where (a is null);
ERROR:  invalid publication WHERE expression for relation "t1"
HINT:  only simple expressions using columns, constants and immutable
system functions are allowed

e.g.2
test_pub=# create publication ptest for table t1 where (a is not null);
ERROR:  invalid publication WHERE expression for relation "t1"
HINT:  only simple expressions using columns, constants and immutable
system functions are allowed

So I think it would be useful to permit the NullTest also. Is it OK?

--
KInd Regards,
Peter Smith.
Fujitsu Australia

Re: Mixing CC and a different CLANG seems like a bad idea

Hi,

On 2021-11-18 16:13:50 -0500, Robert Haas wrote:
> On Thu, Nov 18, 2021 at 3:43 PM Andres Freund  wrote:
> > Hi,
> >
> > Greetings,
> >
> > Andres Freund
> 
> Greetings to you too, Andres. :-)

Oops I sent the email that I copied text from, rather than the one I wanted to
send...

Greetings,

Andres Freund

Re: Mixing CC and a different CLANG seems like a bad idea

Hi,

On 2021-11-18 13:39:04 -0500, Tom Lane wrote:
> After studying configure's list more closely, that doesn't seem like
> a great plan either.  There's a lot of idiosyncrasy in the tests,
> such as things that only apply to C or to C++.

Yea. It seems doable, but not really worth it for now.


> More, I think (though this ought to be documented in a comment) that
> the policy is to not bother turning on extra -W options in the bitcode
> switches, on the grounds that warning once in the main build is enough.
> I follow that idea --- but what we missed is that we still need to
> turn *off* the warnings we're actively disabling.  I shall go do that,
> if no objections.

Thanks for doing that, that does sounds like a good way, at least for now.


On 2021-11-18 13:39:04 -0500, Tom Lane wrote:
> That sounds fairly horrid, but as long as you are using an accache file it's
> not really going to cost that much.

> (BTW, does meson have any comparable optimization?
> If it doesn't, I bet that is going to be a problem.)

Yes - imo in a nicer, more reliable way. It caches most test results, but with
the complete input, including the commandline (i.e. compiler flags) as the
cache key. So no more errors about compile flags changing...

Greetings,

Andres Freund

Re: Time to drop plpython2?

2021-11-18 Thread Andrew Dunstan



On 11/16/21 11:26, Andrew Dunstan wrote:
>
> My other machine with an old python instance is bowerbird. It has python
> 3.4 installed but not used, alongside 2.7 which is udsed. I will install
> the latest and see if that can be made to work.
>
>

bowerbird is now building with python 3.10


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)

On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> Thread starting here:
> https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
> 
> On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > I'm 99% sure the "bad_alloc" is from LLVM.  It happened multiple times on
> > different servers (running a similar report) after setting jit=on during 
> > pg13
> > upgrade, and never happened since re-setting jit=off.
> 
> Since this recurred a few times recently (now running pg14.0), and I finally
> managed to get a non-truncated corefile...

I think the reason this recurred is that, since upgrading to pg14, I no longer
had your memleak patches applied.  I'd forgotten about it, but was probably
running a locally compiled postgres with your patches applied.

I should've mentioned that this crash was associated with the message from the
original problem report:

|terminate called after throwing an instance of 'std::bad_alloc'
|  what():  std::bad_alloc

The leak discussed on other threads seems fixed by your patches - I compiled
v14 and now running with no visible leaks since last week.
https://www.postgresql.org/message-id/flat/20210417021602.7dilihkdc7obl...@alap3.anarazel.de

As I understand it, there's still an issue with an allocation failure causing
SIGABRT rather than FATAL.

It took me several tries to get the corefile since the process is huge, caused
by the leak (and abrtd wanted to truncate it, nullifying its utility).

-rw---. 1 postgres postgres 8.4G Nov 10 08:57 
/var/lib/pgsql/14/data/core.31345

I installed more debug packages to get a fuller stacktrace.

#0  0x7f2497880337 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f2497881a28 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x7f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from 
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#3  0x7f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from 
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#4  0x7f2487c666c3 in std::terminate() () from 
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#5  0x7f2487c687d3 in __cxa_throw () from 
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#6  0x7f2487c686cd in operator new(unsigned long) () from 
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so
No symbol table info available.
#7  0x7f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, 
Num=) at 
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
No locals.
#8  llvm::DenseMap >, llvm::DenseMapAPIntKeyInfo, 
llvm::detail::DenseMapPair > > >::grow (this=this@entry=0x2ff7f38, 
AtLeast=)
at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
OldNumBuckets = 33554432
OldBuckets = 0x7f23f3e42010
#9  0x7f2486477f29 in grow (AtLeast=, this=0x2ff7f38) at 
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
No locals.
#10 InsertIntoBucketImpl (TheBucket=, Lookup=..., 
Key=..., this=0x2ff7f38) at 
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
NewNumEntries = 
EmptyKey = 
#11 InsertIntoBucket (Key=..., TheBucket=, 
this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
No locals.
#12 FindAndConstruct (Key=..., this=0x2ff7f38) at 
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
TheBucket = 
#13 operator[] (Key=..., this=0x2ff7f38) at 
/usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
No locals.
#14 llvm::ConstantInt::get (Context=..., V=...) at 
/usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
pImpl = 0x2ff7eb0
#15 0x7f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V=, isSigned=isSigned@entry=false) at 
/usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
No locals.
#16 0x7f248648673d in LLVMConstInt (IntTy=, N=, SignExtend=SignExtend@entry=0) at 
/usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
No locals.
#17 0x7f2488f66c18 in l_ptr_const (type=0x3000650, ptr=) at 
../../../../src/include/jit/llvmjit_emit.h:29
c = 
#18 llvm_compile_expr (state=) at llvmjit_expr.c:246
op = 0x1a5317690
opcode = EEOP_OUTER_VAR
opno = 5
parent = 
funcname = 0x1a53184e8 "evalexpr_4827_151"
context = 0x1ba79b8
b = 
mod = 0x1a5513d30
eval_fn = 
entry = 
v_state = 0x1a5ce09e0
v_econtext = 0x1a5ce0a08
v_isnullp = 0x1a5ce0a30
v_tmpvaluep = 0x1a5ce0aa8
v_tmpisnullp = 0x1a5ce0b48
starttime = {tv_sec = 10799172, tv_nsec = 781670770}
endtime = {tv_sec = 7077194792, tv_nsec = 0}
__func__ = "llvm_compile_expr"
[...]

Re: Mixing CC and a different CLANG seems like a bad idea

On Thu, Nov 18, 2021 at 3:43 PM Andres Freund  wrote:
> Hi,
>
> Greetings,
>
> Andres Freund

Greetings to you too, Andres. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 3:14 PM Tom Lane  wrote:
> Robert Haas  writes:
> > There's a second place where the patch needs to wait for something
> > also, and that one I've crudely kludged with sleep(10). If anybody
> > around here who is good at figuring out how to write clever TAP tests
> > can tell me how to fix this test to be non-stupid, I will happily do
> > so.
>
> As far as that goes, if you conceptualize it as "wait for this text
> to appear in the log file", there's prior art in existing TAP tests.
> Basically, sleep for some reasonable short period and check the
> log file; if not there, repeat until timeout.

Yeah, there's something like that in the form of find_in_log in
019_replslot_init.pl. I thought about copying that, but that didn't
seem great, and I also thought about trying to move into a common
module, which seems maybe better but also more work, and thus not
worth doing unless we have agreement that it's what we should do.

> I'm a little dubious that this test case is valuable enough to
> mess around with a nonstandard postmaster startup protocol, though.
> The main reason I dislike that idea is that any fixes we apply to
> the TAP tests' normal postmaster-startup code would almost inevitably
> miss fixing this test.  IIRC there have been security-related fixes in
> that area (e.g. where do we put the postmaster's socket), so I find
> that prospect pretty scary.

The problem that I have with the present situation is that the test
coverage of xlog.c is pretty abysmal. It actually doesn't look bad if
you just run a coverage report, but there are a shazillion flag
variables in that file and bugs like this make it quite clear that we
don't come close to testing all the possible combinations. It's really
borderline unmaintainable. I don't know whether there's a specific
individual who wrote most of this code and didn't get the memo that
global variables are best avoided, or whether this is sort of case
where we started over 1 or 2 and then it just gradually ballooned into
the giant mess that it now is, but the present situation is pretty
outrageous. It's taking me weeks of time to figure out how to make
changes that would normally take days, or maybe hours. We clearly need
to try both to get the number of cases under control by eliminating
stupid variables that are almost but not quite the same as something
else, and also get proper test coverage for the things that remain so
that it's possible to modify the code without excesive risk of
shooting ourselves in the foot.

That said, I'm not wedded to this particular test case, either. It's
an extremely specific bug that is unlikely to reappear once squashed,
and making the test case robust enough to avoid having the buildfarm
complain seems fraught.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [RFC] building postgres with meson

2021-11-18 Thread Thomas Munro

On Tue, Nov 16, 2021 at 8:23 AM Andres Freund  wrote:
> On 2021-11-15 14:11:25 -0500, Tom Lane wrote:
> > (In fact, unless somebody renews fossa/husky's
> > icc license, the three xlc animals will be an outright majority of
> > them, because wrasse and anole are the only other active animals with
> > non-mainstream compilers.)
>
> It should probably be doable to get somebody to run another icc animal. Icc is
> supported by meson, fwiw.

FWIW, in case someone is interested in bringing ICC back to the farm,
some light googling tells me that newer editions of "classic" ICC (as
opposed to "data parallel" ICC, parts of some kind of rebrand) no
longer require regular licence bureaucracy, and can be installed in
modern easier to maintain ways.  For example, I see that some people
add Intel's APT repository and apt-get install the compiler inside CI
jobs, on Ubuntu.

Re: Mixing CC and a different CLANG seems like a bad idea

Hi,

On 2021-11-18 12:43:15 -0500, Tom Lane wrote:
> Yeah.  I'm inclined to think we ought to just bite the bullet and fold
> CLANG/CLANGXX into the main list of compiler switch probes, so that we
> check every interesting one four times.  That sounds fairly horrid,
> but as long as you are using an accache file it's not really going
> to cost that much.  (BTW, does meson have any comparable optimization?
> If it doesn't, I bet that is going to be a problem.)
> 
>   regards, tom lane
Greetings,

Andres Freund

Re: Should rename "startup process" to something else?

Alvaro Herrera  writes:
> If we change the name, and I support the idea that we do, I think a
> good name would be "wal replay".  I think "recovery" is not great
> precisely because in a standby there is likely no crash that we're
> recovering from.

Fair point.

> The word "replay" is at odds with the other names,
> which stand for the device that carries out the task at hand
> (checkpointer, bgwriter, wal sender/receiver); but the word "replayer"
> seems to be extremely uncommon and IMO looks strange.  If you see a
> process that claims to be "wal replay", you know perfectly well what it
> is.

I'm less concerned about the "er" than about the fact that the name is
two words.  People will immediately shorten it to just "replay", eg
as a part of names in the code, and I feel that that's confusing in
its own way.  Maybe we could run the words together, on the precedent
of "walreceiver", but I never much liked that name either.

regards, tom lane

Re: xlog.c: removing ReadRecPtr and EndRecPtr

Robert Haas  writes:
> There's a second place where the patch needs to wait for something
> also, and that one I've crudely kludged with sleep(10). If anybody
> around here who is good at figuring out how to write clever TAP tests
> can tell me how to fix this test to be non-stupid, I will happily do
> so.

As far as that goes, if you conceptualize it as "wait for this text
to appear in the log file", there's prior art in existing TAP tests.
Basically, sleep for some reasonable short period and check the
log file; if not there, repeat until timeout.

I'm a little dubious that this test case is valuable enough to
mess around with a nonstandard postmaster startup protocol, though.
The main reason I dislike that idea is that any fixes we apply to
the TAP tests' normal postmaster-startup code would almost inevitably
miss fixing this test.  IIRC there have been security-related fixes in
that area (e.g. where do we put the postmaster's socket), so I find
that prospect pretty scary.

regards, tom lane

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 2:21 PM Alvaro Herrera  wrote:
> Would it work to start postmaster directly instad of using pg_ctl, and
> then rely on (say) pg_isready?

I *think* that pg_isready would also fail, because the documentation
says "pg_isready returns 0 to the shell if the server is accepting
connections normally, 1 if the server is rejecting connections (for
example during startup) ..." and I believe the "during startup" case
would apply here.

Starting postmaster directly is a thought. Is there any existing
precedent for that approach?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: xlog.c: removing ReadRecPtr and EndRecPtr

2021-11-18 Thread Alvaro Herrera

On 2021-Nov-18, Robert Haas wrote:

> Unfortunately, this test case isn't remotely committable as-is, and I
> don't know how to make it so. The main problem is that, although you
> can start up a server with nothing in pg_wal, no restore_command, and
> no archive command, pg_ctl will not believe that it has started.

Would it work to start postmaster directly instad of using pg_ctl, and
then rely on (say) pg_isready?

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")

Re: xlog.c: removing ReadRecPtr and EndRecPtr

On Thu, Nov 18, 2021 at 7:30 AM Amul Sul  wrote:
> Somehow with and without patch I am getting the same log.

Try applying the attached 0001-dubious-test-cast.patch for you and see
if that fails. It does for me. If so, then try applying
0002-fix-the-bug.patch and see if that makes it pass.

Unfortunately, this test case isn't remotely committable as-is, and I
don't know how to make it so. The main problem is that, although you
can start up a server with nothing in pg_wal, no restore_command, and
no archive command, pg_ctl will not believe that it has started. I
worked around that problem by telling pg_ctl to ignore failures, but
it still waits for a long time before timing out, which sucks both
because (1) hackers are impatient and (2) some hackers run extremely
slow buildfarm machines where almost any timeout won't be long enough.
There's a second place where the patch needs to wait for something
also, and that one I've crudely kludged with sleep(10). If anybody
around here who is good at figuring out how to write clever TAP tests
can tell me how to fix this test to be non-stupid, I will happily do
so.

Otherwise, I think I will just need to commit and back-patch the
actual bug fix without a test, and then rebase the rest of the patch I
posted previously over top of those changes for master only.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-dubious-test-case.patch
Description: Binary data


0002-fix-the-bug.patch
Description: Binary data

Re: Should rename "startup process" to something else?

2021-11-18 Thread Alvaro Herrera

On 2021-Nov-18, Tom Lane wrote:

> Justin Pryzby  writes:
> > On Thu, Nov 18, 2021 at 12:24:14PM -0500, Tom Lane wrote:
> >> Personally I think making a glossary entry that explains what the
> >> process does would be a better plan than renaming it.
> 
> > Since d3014fff4:
> > https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-STARTUP-PROCESS
> 
> Hm ... I might've found that, if it were in alphabetical order ...

Ugh, my mistake.  If the consensus is not to rename, I'll move it to the
right place.

If we change the name, and I support the idea that we do, I think a
good name would be "wal replay".  I think "recovery" is not great
precisely because in a standby there is likely no crash that we're
recovering from.  The word "replay" is at odds with the other names,
which stand for the device that carries out the task at hand
(checkpointer, bgwriter, wal sender/receiver); but the word "replayer"
seems to be extremely uncommon and IMO looks strange.  If you see a
process that claims to be "wal replay", you know perfectly well what it
is.

The glossary entry reads:

An auxiliary process that replays WAL during crash recovery and in a
physical replica.

(The name is historical: the startup process was named before
replication was implemented; the name refers to its task as it relates
to the server startup following a crash.)

If we rename, we can drop more than half of the entry and replace it
with "This process was previously known as the startup process".

... oh, I noticed another mistake: "WAL receiver" does not say
"(process)", like all other process entries do.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Re: Should rename "startup process" to something else?

On Thu, Nov 18, 2021 at 12:24 PM Tom Lane  wrote:
> Being hackers ourselves, I'm not sure we're qualified to opine on
> that.  I cannot say that I've noticed any questions about it on
> the mailing lists, though.

What I've noticed when supporting EDB customer is that if, say,
there's a problem with recovery, and they're trying to figure out
which process is responsible, they have no intuition that the startup
process is the likely culprit. They could, for example, notice whether
the LSN that shows up in the ps status is advancing over time. Or,
they could notice whether that process is doing a lot of I/O, or a lot
of CPU. But since they have no notion that a startup process has
anything to do with recovery, they don't make that connection. Now you
can argue that I ought to be happy about that because, hey, it's job
security. And you can also argue that even if the process had a better
name, a lot of people wouldn't figure it out for one reason or
another. However, my view is that we do well to make things more
comprehensible when we can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Non-superuser subscription owners

2021-11-18 Thread Jeff Davis

On Wed, 2021-11-17 at 16:17 -0800, Mark Dilger wrote:
> Some of what I perceive as the screwiness of your argument I must
> admin is not your fault.  The properties of subscriptions are defined
> in ways that don't make sense to me.  It would be far more sensible
> if connection strings were objects in their own right, and you could
> grant USAGE on a connection string to a role,

We sort of have that with CREATE SERVER, in fact dblink can use a
server instead of a string. 

Regards,
Jeff Davis

Re: Mixing CC and a different CLANG seems like a bad idea

I wrote:
> Yeah.  I'm inclined to think we ought to just bite the bullet and fold
> CLANG/CLANGXX into the main list of compiler switch probes, so that we
> check every interesting one four times.

After studying configure's list more closely, that doesn't seem like
a great plan either.  There's a lot of idiosyncrasy in the tests,
such as things that only apply to C or to C++.

More, I think (though this ought to be documented in a comment) that
the policy is to not bother turning on extra -W options in the bitcode
switches, on the grounds that warning once in the main build is enough.
I follow that idea --- but what we missed is that we still need to
turn *off* the warnings we're actively disabling.  I shall go do that,
if no objections.

regards, tom lane

Re: Non-superuser subscription owners

2021-11-18 Thread Jeff Davis

On Wed, 2021-11-17 at 16:17 -0800, Mark Dilger wrote:
>  You must choose a single role you want the subscription to run
> under.

I think that was the source of a lot of my confusion: 

Your patches are creating the notion of a "run as" user by assigning
ownership-that-isn't-really-ownership. I got distracted wondering why
you would really care if some user could enable/disable/refresh/rename
a subscription, but the main point was to change who the subscription
runs as.

That's a more general idea: I could see how "run as" could apply to
subscriptions as well as functions (right now it can only run as the
owner or the invoker, not an arbitrary role). And I better understand
your analogy to security definer now.

But it's also not exactly a simple idea, and I think the current
patches oversimplify it and conflate it with ownership. 

> I think the longer term plan is that non-superusers who have some
> privileged role will be allowed to create subscriptions,

You earlier listed some challenges with that:

https://postgr.es/m/cf56ac0d-7495-4e8d-a48f-ff38bd807...@enterprisedb.com

But it seems like it's really the right direction to go. Probably the
biggest concern is connection strings that read server files, but
dblink solved that by requiring password auth.

What are the reasonable steps to get there? Do you think anything is
doable for v15?

> There is a cart-before-the-horse problem here.

I don't think we need to hold up v2-0003. It seems like a step in the
right direction, though I haven't looked closely yet.

Regards,
Jeff Davis

Re: make tuplestore helper function

On Thu, Nov 18, 2021 at 12:59:03PM -0500, Melanie Plageman wrote:
> On Mon, Nov 8, 2021 at 3:13 PM Justin Pryzby  wrote:
> > On Mon, Nov 08, 2021 at 02:52:28PM -0500, Melanie Plageman wrote:
> > > On Tue, Nov 2, 2021 at 4:23 PM Justin Pryzby  wrote:
> > > >
> > > > Several places have a conditional value for the first argument 
> > > > (randomAccess),
> > > > but your patch changes the behavior to a constant "true".  I didn't 
> > > > review the
> > > > patch beyond that.
> > > >
...
> > > I believe the patch has preserved the same behavior. All of the callers
> > > for which I replaced tuplestore_begin_heap() which passed a variable for
> > > the randomAccess parameter had set that variable to something which was
> > > effectively the same as passing true -- SFRM_Materialize_Random.
> >
> > I don't think so ?
> >
> > They callers aren't passing SFRM_Materialize_Random, but rather
> > (allowedModes & SFRM_Materialize_Random) != 0
> >
> > Where allowedModes is determined EXEC_FLAG_BACKWARD.
...
> You are right. I misread it.
> 
> So, I've attached a patch where randomAccess is now an additional
> parameter (and registered for the next fest).
> 
> I was thinking about how to add a test that would have broken when I
> passed true for randomAccess to tuplestore_begin_heap() when false was
> required. But, I don't fully understand the problem. If backward
> accesses to a tuplestore are not allowed but randomAccess is mistakenly
> passed as true, would the potential result be potentially wrong results
> from accessing the tuplestore results backwards?

No, see here

src/backend/utils/fmgr/README-The Tuplestore must be created with randomAccess 
= true if
src/backend/utils/fmgr/README:SFRM_Materialize_Random is set in allowedModes, 
but it can (and preferably
src/backend/utils/fmgr/README-should) be created with randomAccess = false if 
not.  Callers that can support
src/backend/utils/fmgr/README-both ValuePerCall and Materialize mode will set 
SFRM_Materialize_Preferred,
src/backend/utils/fmgr/README-or not, depending on which mode they prefer.

If you use "randomAccess=true" when it's not needed, the result might be less
efficient.

Some callers specify "true" when they don't need to.  I ran into that here.
https://www.postgresql.org/message-id/21724.1583955...@sss.pgh.pa.us

Maybe you'd want to add an 0002 patch which changes those to conditional ?

BTW, there should be a newline before MakeFuncResultTuplestore().

-- 
Justin

Re: Should rename "startup process" to something else?

Justin Pryzby  writes:
> On Thu, Nov 18, 2021 at 12:24:14PM -0500, Tom Lane wrote:
>> Personally I think making a glossary entry that explains what the
>> process does would be a better plan than renaming it.

> Since d3014fff4:
> https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-STARTUP-PROCESS

Hm ... I might've found that, if it were in alphabetical order ...

regards, tom lane

Re: make tuplestore helper function

2021-11-18 Thread Melanie Plageman

On Mon, Nov 8, 2021 at 3:13 PM Justin Pryzby  wrote:
>
> On Mon, Nov 08, 2021 at 02:52:28PM -0500, Melanie Plageman wrote:
> > On Tue, Nov 2, 2021 at 4:23 PM Justin Pryzby  wrote:
> > >
> > > Several places have a conditional value for the first argument 
> > > (randomAccess),
> > > but your patch changes the behavior to a constant "true".  I didn't 
> > > review the
> > > patch beyond that.
> > >
> > > > @@ -740,18 +724,14 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
> > > > - tupstore =
> > > > - tuplestore_begin_heap(rsinfo->allowedModes & 
> > > > SFRM_Materialize_Random,
> > > > -   false, 
> > > > work_mem);
> > >
> > > > @@ -2701,42 +2701,13 @@ pg_hba_file_rules(PG_FUNCTION_ARGS)
> > > > - tuple_store =
> > > > - tuplestore_begin_heap(rsi->allowedModes & 
> > > > SFRM_Materialize_Random,
> > > > -   false, 
> > > > work_mem);
> > >
> > > > @@ -4799,31 +4797,8 @@ pg_timezone_names(PG_FUNCTION_ARGS)
> > > > - randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) 
> > > > != 0;
> > > > - tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
> > >
> > > > @@ -575,38 +575,12 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS)
> > > > - randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) 
> > > > != 0;
> > > > - tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
> > >
> > > > @@ -1170,17 +1154,12 @@ pg_cursor(PG_FUNCTION_ARGS)
> > > > - tupstore =
> > > > - tuplestore_begin_heap(rsinfo->allowedModes & 
> > > > SFRM_Materialize_Random,
> > > > -   false, 
> > > > work_mem);
> > >
> > > > +++ b/src/backend/utils/fmgr/funcapi.c
> > > > + tupstore = tuplestore_begin_heap(true, false, maxKBytes);
> >
> > I believe the patch has preserved the same behavior. All of the callers
> > for which I replaced tuplestore_begin_heap() which passed a variable for
> > the randomAccess parameter had set that variable to something which was
> > effectively the same as passing true -- SFRM_Materialize_Random.
>
> I don't think so ?
>
> They callers aren't passing SFRM_Materialize_Random, but rather
> (allowedModes & SFRM_Materialize_Random) != 0
>
> Where allowedModes is determined EXEC_FLAG_BACKWARD.
>
> src/include/executor/executor.h:extern Tuplestorestate 
> *ExecMakeTableFunctionResult(SetExprState *setexpr,
> src/include/executor/executor.h-  
>   ExprContext *econtext,
> src/include/executor/executor.h-  
>   MemoryContext argContext,
> src/include/executor/executor.h-  
>   TupleDesc expectedDesc,
> src/include/executor/executor.h-  
>   bool randomAccess);
>
> src/backend/executor/nodeFunctionscan.c=FunctionNext(FunctionScanState *node)
> src/backend/executor/nodeFunctionscan.c: 
> ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
> src/backend/executor/nodeFunctionscan.c-  
>node->ss.ps.ps_ExprContext,
> src/backend/executor/nodeFunctionscan.c-  
>node->argcontext,
> src/backend/executor/nodeFunctionscan.c-  
>node->funcstates[0].tupdesc,
> src/backend/executor/nodeFunctionscan.c-  
>node->eflags & EXEC_FLAG_BACKWARD);
>

You are right. I misread it.

So, I've attached a patch where randomAccess is now an additional
parameter (and registered for the next fest).

I was thinking about how to add a test that would have broken when I
passed true for randomAccess to tuplestore_begin_heap() when false was
required. But, I don't fully understand the problem. If backward
accesses to a tuplestore are not allowed but randomAccess is mistakenly
passed as true, would the potential result be potentially wrong results
from accessing the tuplestore results backwards?

- Melanie


v3-0001-Add-helper-to-make-tuplestore.patch
Description: Binary data

Re: Should rename "startup process" to something else?

On Thu, Nov 18, 2021 at 12:24:14PM -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Nov 18, 2021 at 11:05 AM Tom Lane  wrote:
> >> Yeah, given current usage it would be better to call it the "recovery
> >> process".  However, I'm feeling dubious that it's worth the cost to
> >> change.  The "startup" name is embedded in a lot of places, I think,
> >> and people are used to it.  I fear changing it would create more
> >> confusion than it removes.
> 
> > As far as being used to it, I think hackers are, but regular users are
> > very much not.
> 
> Being hackers ourselves, I'm not sure we're qualified to opine on
> that.  I cannot say that I've noticed any questions about it on
> the mailing lists, though.

A data point: I was recently confused when I observed the "startup" process
running for a bit after restarting the instance (because connections were being
rejected) I concluded that the shutdown was unclean, and started to blame the
PGDG RPM's initscript [0].

Actually, the shutdown was clean, and the "startup" process was just slow doing
$somethingelse (I imagine this will be less confusing in pg15 - 9ce346eabf).

[0] I believe this is configured such that systemd could kill -9 the postmaster
(but that's not what happened in this case).
https://redmine.postgresql.org/issues/6855

If you rename "startup", I think "recovery" would be a bad choice, since it
seems to imply that recovery/wal replay was necessary.

> Personally I think making a glossary entry that explains what the
> process does would be a better plan than renaming it.

Since d3014fff4:
https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-STARTUP-PROCESS

-- 
Justin

Re: Should rename "startup process" to something else?

2021-11-18 Thread Thomas Munro

On Tue, Nov 16, 2021 at 3:33 AM Rushabh Lathia  wrote:
> I think a better name for the process may be “recovery”

+1

Re: Mixing CC and a different CLANG seems like a bad idea

Andres Freund  writes:
> On 2021-11-18 11:56:59 -0500, Tom Lane wrote:
>> Why did we not simply insist that if you want to use --with-llvm, the
>> selected compiler must be clang?  I cannot see any benefit of mix-and-match
>> here.

> It also just seems architecturally wrong: People pressed for making the choice
> of JIT runtime replaceable, and it now is, at some pain. And forcing the main
> compiler seems problematic from that angle.

OK, I concede that's a reasonable concern.  So we need to look more
carefully at how the switches for CLANG are being selected.

> I think the issue is more with trying to be miserly in the choice of compiler
> flag tests to duplicate and how many places to change to choose the right flag
> variable. It's taken a while for this to become a real issue, so it perhaps
> was the right choice at the time.

Yeah.  I'm inclined to think we ought to just bite the bullet and fold
CLANG/CLANGXX into the main list of compiler switch probes, so that we
check every interesting one four times.  That sounds fairly horrid,
but as long as you are using an accache file it's not really going
to cost that much.  (BTW, does meson have any comparable optimization?
If it doesn't, I bet that is going to be a problem.)

regards, tom lane

Re: Mixing CC and a different CLANG seems like a bad idea

Hi,

On 2021-11-18 11:56:59 -0500, Tom Lane wrote:
> I noticed that, a week after Michael pushed 9ff47ea41 to silence
> -Wcompound-token-split-by-macro warnings, buildfarm member sidewinder
> is still spewing them.  Investigation shows that it's building with
> 
> configure: using compiler=cc (nb4 20200810) 7.5.0
> configure: using CLANG=ccache clang
> 
> and the system cc doesn't know -Wcompound-token-split-by-macro,
> so we don't use it, but the modules that are built into bytecode
> still produce the warnings because they're built with clang.

> I think this idea of using clang with switches selected for some other
> compiler is completely horrid, and needs to be nuked from orbit before
> it causes problems worse than mere warnings.

We can test separately for flags, see BITCODE_CFLAGS/BITCODE_CXXFLAGS.

> Why did we not simply insist that if you want to use --with-llvm, the
> selected compiler must be clang?  I cannot see any benefit of mix-and-match
> here.

It seemed like a problematic restriction at the time. And still does to
me.

For one, gcc does generate somewhat better code. For another, extensions could
still compile with a different compiler, and generate bitcode with another
compiler, so it's good to have that easily testable.

It also just seems architecturally wrong: People pressed for making the choice
of JIT runtime replaceable, and it now is, at some pain. And forcing the main
compiler seems problematic from that angle.  With the meson port jit
compilation actually kinda works on windows - but it seems we shouldn't force
people to not use visual studio there, just for that?

I think the issue is more with trying to be miserly in the choice of compiler
flag tests to duplicate and how many places to change to choose the right flag
variable. It's taken a while for this to become a real issue, so it perhaps
was the right choice at the time.

Greetings,

Andres Freund

Re: Mixing CC and a different CLANG seems like a bad idea

=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> Hm, actually it's:

> CC => "ccache cc",
> CXX => "ccache c++",
> CLANG => "ccache clang",

Right.

> want me to change it to:

> CC => "ccache clang",
> CXX => "ccache c++",
> CLANG => "ccache clang",

What I actually think is we should get rid of the separate CLANG
variable.  But don't do anything to the animal's configuration
till that's settled.

BTW, that would presumably lead to wanting to use CXX = "ccache clang++",
too.  I think we don't really want inconsistent CC/CXX either ...
although at least configure knows it needs to probe CXX's flags
separately.  I suppose another way to resolve this would be for
configure to make a third set of probes to see what switches CLANG
has --- but ick.

regards, tom lane

Re: Mixing CC and a different CLANG seems like a bad idea

2021-11-18 Thread Mikael Kjellström


On 2021-11-18 17:56, Tom Lane wrote:

I noticed that, a week after Michael pushed 9ff47ea41 to silence
-Wcompound-token-split-by-macro warnings, buildfarm member sidewinder
is still spewing them.  Investigation shows that it's building with

configure: using compiler=cc (nb4 20200810) 7.5.0
configure: using CLANG=ccache clang



Hm, actually it's:

CC => "ccache cc",
CXX => "ccache c++",
CLANG => "ccache clang",

want me to change it to:

CC => "ccache clang",
CXX => "ccache c++",
CLANG => "ccache clang",

?

/Mikael

Re: Should rename "startup process" to something else?

Robert Haas  writes:
> On Thu, Nov 18, 2021 at 11:05 AM Tom Lane  wrote:
>> Yeah, given current usage it would be better to call it the "recovery
>> process".  However, I'm feeling dubious that it's worth the cost to
>> change.  The "startup" name is embedded in a lot of places, I think,
>> and people are used to it.  I fear changing it would create more
>> confusion than it removes.

> What sorts of places are you thinking about?

Aside from our own code, I imagine a lot of people have monitoring
scripts that know about it.

> As far as being used to it, I think hackers are, but regular users are
> very much not.

Being hackers ourselves, I'm not sure we're qualified to opine on
that.  I cannot say that I've noticed any questions about it on
the mailing lists, though.

Personally I think making a glossary entry that explains what the
process does would be a better plan than renaming it.

regards, tom lane

Re: Should rename "startup process" to something else?

On Thu, Nov 18, 2021 at 11:05 AM Tom Lane  wrote:
> Yeah, given current usage it would be better to call it the "recovery
> process".  However, I'm feeling dubious that it's worth the cost to
> change.  The "startup" name is embedded in a lot of places, I think,
> and people are used to it.  I fear changing it would create more
> confusion than it removes.

What sorts of places are you thinking about?

As far as being used to it, I think hackers are, but regular users are
very much not. And there are a lot more regular users than there are
hackers, and it's harder to get them to pay attention, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Mixing CC and a different CLANG seems like a bad idea

I noticed that, a week after Michael pushed 9ff47ea41 to silence
-Wcompound-token-split-by-macro warnings, buildfarm member sidewinder
is still spewing them.  Investigation shows that it's building with

configure: using compiler=cc (nb4 20200810) 7.5.0
configure: using CLANG=ccache clang

and the system cc doesn't know -Wcompound-token-split-by-macro,
so we don't use it, but the modules that are built into bytecode
still produce the warnings because they're built with clang.

I think this idea of using clang with switches selected for some other
compiler is completely horrid, and needs to be nuked from orbit before
it causes problems worse than mere warnings.  Why did we not simply
insist that if you want to use --with-llvm, the selected compiler must
be clang?  I cannot see any benefit of mix-and-match here.

regards, tom lane

Re: Printing backtrace of postgres processes

On Wed, Nov 17, 2021 at 08:12:44PM +0530, vignesh C wrote:
> Attached v14 patch has the fixes for the same.

Thanks for updating the patch.

I cleaned up the docs and comments.  I think this could be nearly "Ready".

If you like the changes in my "fixup" patch (0002 and 0004), you should be able
to apply my 0002 on top of your 0001.  I'm sure it'll cause some conflicts with
your 2nd patch, though...

This doesn't bump the catversion, since that would cause the patch to fail in
cfbot every time another commit updates catversion.

Your 0001 patch allows printing backtraces of autovacuum, but the doc says it's
only for "backends".  Should the change to autovacuum.c be moved to the 2nd
patch ?  Or, why is autovacuum so special that it should be handled in the
first patch ?

-- 
Justin
>From 0829c99951194df3e101e1c648a05bb5651ef090 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Tue, 9 Nov 2021 16:28:03 +0530
Subject: [PATCH 1/4] Add function to log the backtrace of the specified
 backend process.

This commit adds pg_log_backtrace() function that requests to log
the backtrace of the specified backend process.

Only superusers are allowed to request to log the backtrace
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its backtrace at LOG_SERVER_ONLY level,
so that the backtrace will appear in the server log but not
be sent to the client.

Bump catalog version.
---
 doc/src/sgml/func.sgml| 85 +++
 src/backend/catalog/system_functions.sql  |  2 +
 src/backend/postmaster/autovacuum.c   |  4 ++
 src/backend/storage/ipc/procsignal.c  | 41 +++
 src/backend/storage/ipc/signalfuncs.c | 49 +
 src/backend/tcop/postgres.c   |  4 ++
 src/backend/utils/error/elog.c| 20 --
 src/backend/utils/init/globals.c  |  1 +
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/miscadmin.h   |  1 +
 src/include/storage/procsignal.h  |  3 +-
 src/include/utils/elog.h  |  2 +
 src/test/regress/expected/backtrace.out   | 42 +++
 src/test/regress/expected/backtrace_1.out | 46 
 src/test/regress/parallel_schedule|  2 +-
 src/test/regress/sql/backtrace.sql| 30 
 16 files changed, 331 insertions(+), 6 deletions(-)
 create mode 100644 src/test/regress/expected/backtrace.out
 create mode 100644 src/test/regress/expected/backtrace_1.out
 create mode 100644 src/test/regress/sql/backtrace.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 24447c0017..a88583edfd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25345,6 +25345,32 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backtrace
+
+pg_log_backtrace ( pid integer )
+boolean
+   
+   
+Requests to log the backtrace of the
+backend
+with the specified process ID. The backtrace will be logged at message
+level LOG. It will appear in the server log based on
+the log configuration set (See 
+for more information), but will not be sent to the client regardless of
+. A backtrace will identify
+where exactly the backend process is currently executing. This may be
+useful to developers to diagnose stuck processes and other problems.
+This feature is not supported for the postmaster, logger, checkpointer,
+walwriter, background writer or statistics collector process. This
+feature will be available if PostgreSQL was built with the ability to
+capture backtrace. If not available, the function will emit a warning
+and return false.
+   
+  
+
   

 
@@ -25458,6 +25484,65 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_backtrace can be used to log the backtrace of
+a backend process. For example:
+
+postgres=# select pg_log_backtrace(pg_backend_pid());
+ pg_log_backtrace
+--
+ t
+(1 row)
+
+The backtrace will be logged to the log file if logging is enabled, if logging
+is disabled backtrace will be logged to the console where the postmaster was
+started. For example:
+
+2021-01-27 11:33:50.247 IST [111735] LOG:  current backtrace:
+postgres: postgresdba postgres [local] SELECT(set_backtrace+0x38) [0xae06c5]
+postgres: postgresdba postgres [local] SELECT(ProcessInterrupts+0x788) [0x950c34]
+postgres: postgresdba postgres [local] SELECT() [0x761e89]
+postgres: postgresdba postgres [local] SELECT() [0x71bbda]
+postgres: postgresdba postgres [local] SELECT() [0x71e380]
+

Re: Should rename "startup process" to something else?

Robert Haas  writes:
> That's true, but those tasks are very brief. Nobody's going to get too
> confused by a "recovery" process that shows up for a few milliseconds
> when there's no recovery to be done. Having a "startup" process that
> sticks around forever on a standy, though, actually is confusing.

Yeah, given current usage it would be better to call it the "recovery
process".  However, I'm feeling dubious that it's worth the cost to
change.  The "startup" name is embedded in a lot of places, I think,
and people are used to it.  I fear changing it would create more
confusion than it removes.

regards, tom lane

Re: Non-superuser subscription owners

2021-11-18 Thread Mark Dilger

> On Nov 18, 2021, at 3:37 AM, Amit Kapila  wrote:
> 
>> I have rethought my prior analysis.  The problem in the previous patch was 
>> that the subscription apply workers did not check for a change in ownership 
>> the way they checked for other changes, instead only picking up the new 
>> ownership information when the worker restarted for some other reason.  This 
>> next patch set fixes that.  The application of a change record may continue 
>> under the old ownership permissions when a concurrent command changes the 
>> ownership of the subscription, but the worker will pick up the new 
>> permissions before applying the next record.
>> 
> 
> Are you talking about the below change in the above paragraph?
> 
> @@ -2912,6 +2941,7 @@ maybe_reread_subscription(void)
>  strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
>  newsub->binary != MySubscription->binary ||
>  newsub->stream != MySubscription->stream ||
> + newsub->owner != MySubscription->owner ||
>  !equal(newsub->publications, MySubscription->publications))
>  {
> 
> If so, I am not sure how it will ensure that we check the ownership
> change before applying each change? I think this will be invoked at
> each transaction boundary, so, if there is a transaction with a large
> number of changes, all the changes will be processed under the
> previous owner.

Yes, your analysis appears correct.  I was sloppy to say "before applying the 
next record".  It will pick up the change before applying the next transaction.

The prior version of the patch only picked up the change if it happened to 
start a new worker, but could process multiple transactions without noticing 
the change.  Now, it is limited to finishing the current transaction.  Would 
you prefer that the worker noticed the change in ownership and aborted the 
transaction on the subscriber side?  Or should the ALTER SUBSCRIPTION..OWNER TO 
block?  I don't see much advantage to either of those options, but I also don't 
think I have any knock-down argument for my approach either.  What do you think?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Non-superuser subscription owners

2021-11-18 Thread Mark Dilger

> On Nov 18, 2021, at 2:50 AM, Amit Kapila  wrote:
> 
>> I gave that a slight amount of thought during the design of this patch, but 
>> didn't think we could refuse to revoke superuser on such a basis, and didn't 
>> see what we should do with the subscription other than have it continue to 
>> be owned by the recently-non-superuser.  If you have a better idea, we can 
>> discuss it, but to some degree I think that is also orthogonal to the 
>> purpose of this patch.  The only sense in which this patch depends on that 
>> issue is that this patch proposes that non-superuser subscription owners are 
>> already an issue, and therefore that this patch isn't creating a new issue, 
>> but rather making more sane something that already can happen.
>> 
> 
> Don't we want to close this gap irrespective of the other part of the
> feature? I mean if we take out the part of your 0003 patch that checks
> whether the current user has permission to perform a particular
> operation on the target table then the gap related to the owner losing
> superuser privileges should be addressed.

I don't think there is a gap.  The patch does the right thing, causing the 
subscription whose owner has had superuser revoked to itself no longer function 
with superuser privileges.  Whether that causes the subscription to fail 
depends on whether the previously-superuser now non-superuser owner now lacks 
sufficient privileges on the target relation(s).  I think removing that part of 
the patch would be a regression.

Let's compare two scenarios.  In the first, we have a regular user "alice" who 
owns a subscription which replicates into table "accounting.receipts" for which 
she has been granted privileges by the table's owner.  What would you expect to 
happen after the table's owner revokes privileges from alice?  I would expect 
that the subscription can no longer function, and periodic attempts to 
replicate into that table result in permission denied errors in the logs.

In the second, we have a superuser "alice" who owns a subscription that 
replicates into table "accounting.receipts", and she only has sufficient 
privileges to modify "accounting.receipts" by virtue of being superuser.  I 
would expect that when she has superuser revoked, the subscription can likewise 
no longer function.  

Now, maybe I'm wrong in both cases, and both should continue to function.  But 
I would find it really strange if the first situation behaved differently from 
the second.

I think intuitions about how subscriptions behave differ depending on the 
reason you expect the subscription to be owned by a particular user.  If the 
reason the user owns the subscription is that the user just happens to be the 
user who created it, but isn't in your mind associated with the subscription, 
then having the subscription continue to function regardless of what happens to 
the user, even the user being dropped, is probably consistent with your 
expectations.  In a sense, you think of the user who creates the subscription 
as having gifted it to the universe rather than continuing to own it.  Or 
perhaps you think of the creator of the subscription as a 
solicitor/lawyer/agent working on behalf of client, and once that legal 
transaction is completed, you don't expect the lawyer being disbarred should 
impact the subscription which exists for the benefit of the client.

If instead you think about the subscription owner as continuing to be closely 
associated with the subscription (as I do), then you expect changes in the 
owner's permissions to impact the subscription.

I think the "gifted to the universe"/"lawyer" mental model is not consistent 
with how the system is already designed to work.  You can't drop the 
subscription's owner without first running REASSIGN OWNED, or ALTER 
SUBSCRIPTION..OWNER TO, or simply dropping the subscription:

  DROP ROLE regress_subscription_user;
  ERROR:  role "regress_subscription_user" cannot be dropped because some 
objects depend on it
  DETAIL:  owner of subscription regress_testsub

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Failed transaction statistics to measure the logical replication progress

2021-11-18 Thread osumi.takami...@fujitsu.com

On Thursday, November 18, 2021 12:26 PM Amit Kapila  
wrote:
> BTW, I think the way you are computing error_count in
> pgstat_recv_subworker_error() doesn't seem correct to me because it will
> accumulate the counter/bytes for the same error again and again.
> You might want to update these counters after we have checked that the
> received error is not the same as the previous one.
Thank you for your comments !
This is addressed by v13 patchset [1]


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373533A5C24BDDA516DA7E1ED9B9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

Andrey Borodin  writes:
> Let's add more tests that check survival of 2PC through crash recovery? We do 
> now only one restart. Maybe it worth to do 4 or 8?

That seems a little premature when we can't explain the failure
we have.  Also, buildfarm cycles aren't free.

regards, tom lane

RE: Failed transaction statistics to measure the logical replication progress

2021-11-18 Thread osumi.takami...@fujitsu.com

On Thursday, November 18, 2021 8:35 PM vignesh C  wrote:
> On Tue, Nov 16, 2021 at 6:04 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, November 15, 2021 9:14 PM I wrote:
> > > I've conducted some update for this.
> > > (The rebased part is only C code and checked by pgindent)
> > I'll update my patches since a new skip xid patch has been shared in
> > [1].
> >
> > This version includes some minor renames of functions that are related
> > to transaction sizes.
> 
> Thanks for the updated patch, Few comments:
Thank you for checking the patches !


> 1) since pgstat_get_subworker_prepared_txn is called from only one place and
> create is passed as true, we can remove create function parameter or the
> function could be removed.
> + * Return subscription worker entry with the given subscription OID and
> + * gid.
> + * --
> + */
> +static PgStat_SW_PreparedXactEntry *
> +pgstat_get_subworker_prepared_txn(Oid databaseid, Oid subid,
> + char
> *gid, bool create)
> +{
> +   PgStat_StatDBEntry *dbentry;
> +   PgStat_SW_PreparedXactKey key;
Removed the parameter.


>  2) Include subworker prepared transactions also
>   /*
> * Don't create tables/functions/subworkers hashtables for
> * uninteresting databases.
> */
> if (onlydb != InvalidOid)
> {
> if (dbbuf.databaseid != onlydb &&
> dbbuf.databaseid != InvalidOid)
> break;
> }
Fixed.


> 3) Similarly it should be mentioned in:
> reset_dbentry_counters function header, pgstat_read_db_statsfile function
> header and pgstat_get_db_entry function comments.
Fixed.

 
> 4) I felt we can remove "COMMIT of streaming transaction", since only commit
> and commit prepared are the user operations. Shall we change it to "COMMIT
> and COMMIT PREPARED will increment this counter."
> +   commit_count bigint
> +  
> +  
> +   Number of transactions successfully applied in this subscription.
> +   COMMIT, COMMIT of streaming transaction and COMMIT
> PREPARED increments
> +   this counter.
> +  
> + 
You are right ! Fixed.

 
> 5) PgStat_SW_PreparedXactEntry should be before
> PgStat_SW_PreparedXactKey  PgStat_StatSubWorkerEntry
> PgStat_StatSubWorkerKey
> +PgStat_SW_PreparedXactKey
> +PgStat_SW_PreparedXactEntry
>  PgStat_StatTabEntry
>  PgStat_SubXactStatus
Fixed.

> 6)  This change is not required
> @@ -293,6 +306,7 @@ static inline void cleanup_subxact_info(void);  static
> void stream_cleanup_files(Oid subid, TransactionId xid);  static void
> stream_open_file(Oid subid, TransactionId xid, bool first);  static void
> stream_write_change(char action, StringInfo s);
> +
>  static void stream_close_file(void);
Removed.


Other changes are
1. refined the commit message of v13-0003*.
2. made the additional comment for ApplyErrorCallbackArg simple.
3. wrote more explanations about update_apply_change_size() as comment.
4. changed the behavior of pgstat_recv_subworker_error so that
   it can store stats info only once per error.
5. added one simple test for PREPARE and COMMIT PREPARED.

This used v23 skip xid patch [1].
(I will remove v13-0001* when the column names are fixed
and Sawada-san starts to take care of the column name definitions)

[1] - 
https://www.postgresql.org/message-id/CAD21AoA5jupM6O%3DpYsyfaxQ1aMX-en8%3DQNgpW6KfXsg7_CS0CQ%40mail.gmail.com

Best Regards,
Takamichi Osumi



v13-0001-Rename-existing-columns-of-pg_stat_subscription_.patch
Description:  v13-0001-Rename-existing-columns-of-pg_stat_subscription_.patch


v13-0002-Export-PartitionTupleRouting-for-transaction-siz.patch
Description:  v13-0002-Export-PartitionTupleRouting-for-transaction-siz.patch


v13-0003-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v13-0003-Extend-pg_stat_subscription_workers-to-include-g.patch

Re: Teach pg_receivewal to use lz4 compression

2021-11-18 Thread Jeevan Ladhe

In dir_open_for_write() I observe that we are writing the header
and then calling LZ4F_compressEnd() in case there is an error
while writing the buffer to the file, and the output buffer of
LZ4F_compressEnd() is not written anywhere. Why should this be
necessary? To flush the contents of the internal buffer? But, then we
are calling LZ4F_freeCompressionContext() immediately after the
LZ4F_compressEnd() call. I might be missing something, will be
happy to get more insights.

Regards,
Jeevan Ladhe

On Fri, Nov 5, 2021 at 1:21 PM  wrote:

>
>
> ‐‐‐ Original Message ‐‐‐
>
> On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier <
> mich...@paquier.xyz> wrote:
>
> >
> > I have spent an extra couple of hours staring at the code, and the
> > whole looked fine, so applied. While on it, I have tested the new TAP
> > tests with all the possible combinations of --without-zlib and
> > --with-lz4.
>
> Great news. Thank you very much.
>
> Cheers,
> //Georgios
>
> > --
> > Michael
>
>
>

Re: Should rename "startup process" to something else?

On Thu, Nov 18, 2021 at 5:34 AM Bharath Rupireddy
 wrote:
> The startup process main function looks to be to do some
> initialization required for recovery and do the recovery, exit if it
> is a crash recovery or stay there if it is a standby recovery. Apart
> from these I'm not sure it does any non-recovery tasks. Does the
> startup process have any relevance or work to do if the server state
> is DB_SHUTDOWNED i.e the server was shutdown properly? I think yes,
> although it doesn't enter the recovery/REDO loop, it does a bunch of
> other things like updating the control file, pre-allocating WAL files,
> updating shared memory state to allow other backends to write WAL and
> so on.

That's true, but those tasks are very brief. Nobody's going to get too
confused by a "recovery" process that shows up for a few milliseconds
when there's no recovery to be done. Having a "startup" process that
sticks around forever on a standy, though, actually is confusing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[RFC] ASOF Join

2021-11-18 Thread Alexander Kuzmenkov

Hi hackers,

There was some interest in implementing ASOF joins in Postgres, see
e.g. this prototype patch by Konstantin Knizhnik:
https://www.postgresql.org/message-id/flat/bc494762-26bd-b100-e1f9-a97901ddad57%40postgrespro.ru
I't like to discuss the possible ways of implementation, if there is
still any interest in that.

Introduction

ASOF join is often used to work with time series data such as stock
quotes or IoT sensors. It is an interpolation where we want to relate
two different time series measured at different points in time. For
each value of the first time series, we take the most recent value of
the second.

Besides an inequality condition on timestamp, such join can also have
equality conditions on other columns. For example, this query joins
two tables that contain bids and asks, finding the most recent task
for each bid for given financial instrument:

```sql
SELECT bids.ts timebid, bid, ask
FROM bids
ASOF JOIN asks ON bid.instrument = ask.instrument
AND ask.ts <= bid.ts;
```

Semantically, this is equivalent to the following correlated subquery:
```sql
SELECT bids.ts timebid, bid, ask
FROM bids,
LATERAL (select * from asks
WHERE asks.instrument = bids.instrument AND asks.ts <= bids.ts
ORDER BY ts DESC LIMIT 1) t;
```
This form is useful to think about which optimizations we can perform
with an ASOF join, how it behaves with respect to other joins, and so
on.

QuestDB has some good docs on this with more examples:
https://questdb.io/docs/reference/sql/join/#asof-join

What Conditions Work with ASOF Join

Conditions for an ASOF join consist of one inequality condition (>=
etc), and optionally a number of equality conditions. All these
conditions must be "mergejoinable" in PG terms -- they must belong to
a btree operator family, which means there is a sorting operator that
corresponds to the condition, which means we can perform a merge join.
They also must support hashing because we'll probably need both
sorting and hashing for implementation (see below). This holds for the
usual data types like numeric. It is natural to think of the
inequality column as "time", but technically it can be any column,
even a string one, w/o changing the algorithm.

Join variants

The purpose of ASOF join is interpolation of one time series to match
another, so it is natural to think of it as an INNER join. The outer
variants might be less useful. Technically, it is easy to extend it to
LEFT ASOF JOIN, where we would output nulls for the right hand columns
if we haven’t yet seen a match. RIGHT and FULL variants also make
sense, but the implementation may be impossible, depending on the
algorithm -- merge and hash joins can support these variants, but the
nested loop cannot.

Use in Combination with Normal Joins

The difference of ASOF join from normal join is that for the
inequality condition, it does not output all the rows that match it,
but only the most recent one. We can think of it as first performing a
normal join and then applying a filter that selects the latest right
hand row. Which row is the "latest" depends on the entire set of rows
that match the join conditions (same as with LIMIT). This means that
the result of ASOF join may depend on the place in the join tree where
it is evaluated, because other joins may remove some rows. Similar to
outer joins, we must respect the user-specified join order for an ASOF
join. It is useful to think about pushing another join below an ASOF
join as pushing a join below a correlated subquery with LIMIT (see
above). This transformation might be correct in some cases, so we
might later think about adding some optimization for join order for
ASOF join.

Proposed Syntax

ASOF join is semantically distinct from a normal join on the same
conditions, so it requires separate grammar. ASOF modifier + listing
all the conditions in the ON section, looks like a good baseline:
`bids ASOF JOIN asks ON asks.timestamp <= bids.timestamp AND
asks.instrument = bids.instrument`

Algorithms

Let's see which algorithm we can use to perform an ASOF join if we
have a "<=" condition on timestamp and several "=" conditions on other
columns (equi-columns).

1. Hash on Equi-keys

This is what ClickHouse uses. It builds a hash table on equi columns,
then for each equi-key builds an array of timestamps, sorted on
demand. This requires bringing the entire right hand table into
memory, so not feasible for large tables.

2. Merge Join on (equi-keys, timestamp) Sorting

This is a natural extension of the merge join algorithm, but instead
of returning all keys for the timestamp column, it returns only the
latest one. A drawback of this algorithm is that the data must be
sorted on timestamp last, so we can't reuse the natural ordering of
the time series data encoded by a (timestamp) index. We will have to
sort both tables entirely in different order, which is prohibitively
costly for large tables. Another way is to create an index on
(equi-keys, timestamp). This would

Re: Propose a new hook for mutating the query bounds

2021-11-18 Thread Tomas Vondra


On 11/18/21 10:59, Xiaozhe Yao wrote:

Hi,

Thanks for the previous feedbacks!

 > The way the hook is used seems pretty inconvenient, though.

I see the problem, and I agree.

I looked into how other hooks work, and I am wondering if it looks ok if 
we: pass a pointer to the hook, and let the hook check if there is any 
information applicable. If there is none, the hook just returns False 
and we let the rest of the code handle. If it is true, we get the 
selectivity from the hook and return it. So something like


```
if (clauselist_selectivity_hook &&
(*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, 
sjinfo, use_extended_stats, ))

{
return s1;
}
```



No, that doesn't really solve the issue, because it's all or nothing 
approach. What if you ML can help estimating just a subset of clauses? 
IMHO the hooks should allow estimating the clauses the ML model was 
built on, and then do the usual estimation for the remaining ones. 
Otherwise you still have to copy large parts of the code.


What I am trying to mock is the get_index_stats_hook 
(https://github.com/taminomara/psql-hooks/blob/master/Detailed.md#get_index_stats_hook 
). 



But that hook only deals with a single index at a time - either it finds 
stats for it or not. But this new hook deals with a list of clauses, it 
should allow processing just a subset of them, I think.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Skipping logical replication transactions on subscriber side

2021-11-18 Thread Masahiko Sawada

On Wed, Nov 17, 2021 at 7:46 PM vignesh C  wrote:
>
> On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada  
> wrote:
> >
> > On Mon, Nov 15, 2021 at 11:43 PM vignesh C  wrote:
> > >
> > > On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow  
> > > > wrote:
> > > > >
> > > > > On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> >
> > Right. I've fixed this issue and attached an updated patch.
>
> Few comments:

Thank you for the comments!

> 1) should we set subwentry to NULL to handle !create && !found case
> or we could return NULL similar to the earlier function.
> +static PgStat_StatSubWorkerEntry *
> +pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
> Oid subrelid,
> +  bool create)
> +{
> +   PgStat_StatSubWorkerEntry *subwentry;
> +   PgStat_StatSubWorkerKey key;
> +   boolfound;
> +   HASHACTION  action = (create ? HASH_ENTER : HASH_FIND);
> +
> +   key.subid = subid;
> +   key.subrelid = subrelid;
> +   subwentry = (PgStat_StatSubWorkerEntry *)
> hash_search(dbentry->subworkers,
> +
>(void *) ,
> +
>action, );
> +
> +   /* If not found, initialize the new one */
> +   if (create && !found)

It's better to return NULL if !create && !found. WIll fix.

>
> 2) Should we keep the line width to 80 chars:
> +/* --
> + * PgStat_MsgSubWorkerErrorSent by the apply worker or
> the table sync worker to
> + * report
> the error occurred during logical replication.
> + * --
> + */
> +#define PGSTAT_SUBWORKERERROR_MSGLEN 256
> +typedef struct PgStat_MsgSubWorkerError
> +{

Hmm, pg_indent seems not to fix it. Anyway, will fix.

I'll fix an updated patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Skipping logical replication transactions on subscriber side

2021-11-18 Thread Masahiko Sawada

On Wed, Nov 17, 2021 at 12:43 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tues, Nov 16, 2021 2:31 PM Masahiko Sawada  wrote:
> > Right. I've fixed this issue and attached an updated patch.
>
> Hi,
>
> Thanks for updating the patch.
> Here are few comments.

Thank you for the comments!

>
> 1)
>
> +pg_stat_reset_subscription_worker ( 
> subid oid,  
> relid oid  )
>
> It seems we should put '' before the comma(',').

Will fix.

>
>
> 2)
> + 
> +  
> +   subrelid oid
> +  
> +  
> +   OID of the relation that the worker is synchronizing; null for the
> +   main apply worker
> +  
> + 
>
> Is the 'subrelid' only used for distinguishing the worker type ? If so, would 
> it
> be clear to have a string value here. I recalled the previous version patch 
> has
> failure_source column but was removed. Maybe I missed something.

As Amit mentioned, users can use this check which table sync worker.

>
>
> 3)
> .
> +extern void pgstat_reset_subworker_stats(Oid subid, Oid subrelid, bool 
> allstats);
>
> I didn't find the code of this functions, maybe we can remove this 
> declaration ?

Will remove.

I'll submit an updated patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Allow CURRENT_ROLE in GRANTED BY

2021-11-18 Thread Daniel Gustafsson




> On 18 Nov 2021, at 14:41, Peter Eisentraut 
>  wrote:
> 
> On 16.11.21 15:27, Daniel Gustafsson wrote:
 On 16 Nov 2021, at 15:04, Daniel Gustafsson  wrote:
>>> ..or should the attached small diff be applied to fix it?
>> Actually it shouldn't, I realized when hitting Send that it was the wrong
>> version.  The attached is the proposed diff.
> 
> This appears to have been an oversight.

Thanks for confirming, I’ll take another pass over the proposed diff in a bit.

Re: Allow CURRENT_ROLE in GRANTED BY

2021-11-18 Thread Peter Eisentraut


On 16.11.21 15:27, Daniel Gustafsson wrote:

On 16 Nov 2021, at 15:04, Daniel Gustafsson  wrote:



..or should the attached small diff be applied to fix it?


Actually it shouldn't, I realized when hitting Send that it was the wrong
version.  The attached is the proposed diff.


This appears to have been an oversight.

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

On Thu, Oct 7, 2021 at 10:43 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
> > Looking at the proposed API from the initial email, I like that there's
> > both stats functionality and WAL record inspection functionality
> > (similar to pg_waldump). I like that there's the ability to pull the
> > records as raw bytea data, however I think we're also going to want an
> > ability in v1 of the patch to decode it (similar to pageinspect
> > heap_page_item_attrs, etc).
>
> I'm yet to start working on the patch. I will be doing it soon.

Thanks all. Here's the v1 patch set for the new extension pg_walinspect.
Note that I didn't include the documentation part now, I will be doing it a
bit later.

Please feel free to review and provide your thoughts.

Regards,
Bharath Rupireddy.
From 712077e3ea86790e887f72ec3546a8661a62389a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 17 Nov 2021 13:43:41 +
Subject: [PATCH v1] pg_walinspect

---
 contrib/Makefile |   1 +
 contrib/pg_walinspect/.gitignore |   4 +
 contrib/pg_walinspect/Makefile   |  26 +
 contrib/pg_walinspect/pg_walinspect--1.0.sql | 106 +++
 contrib/pg_walinspect/pg_walinspect.c| 801 +++
 contrib/pg_walinspect/pg_walinspect.control  |   5 +
 src/backend/access/transam/xlogreader.c  |  14 +-
 src/include/access/xlogreader.h  |   2 -
 8 files changed, 950 insertions(+), 9 deletions(-)
 create mode 100644 contrib/pg_walinspect/.gitignore
 create mode 100644 contrib/pg_walinspect/Makefile
 create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0.sql
 create mode 100644 contrib/pg_walinspect/pg_walinspect.c
 create mode 100644 contrib/pg_walinspect/pg_walinspect.control

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..780059ce66 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -39,6 +39,7 @@ SUBDIRS = \
 		pgrowlocks	\
 		pgstattuple	\
 		pg_visibility	\
+		pg_walinspect	\
 		postgres_fdw	\
 		seg		\
 		spi		\
diff --git a/contrib/pg_walinspect/.gitignore b/contrib/pg_walinspect/.gitignore
new file mode 100644
index 00..5dcb3ff972
--- /dev/null
+++ b/contrib/pg_walinspect/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
new file mode 100644
index 00..c92a97447f
--- /dev/null
+++ b/contrib/pg_walinspect/Makefile
@@ -0,0 +1,26 @@
+# contrib/pg_walinspect/Makefile
+
+MODULE_big = pg_walinspect
+OBJS = \
+	$(WIN32RES) \
+	pg_walinspect.o
+PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log"
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK_INTERNAL = $(libpq)
+
+EXTENSION = pg_walinspect
+DATA = pg_walinspect--1.0.sql
+
+REGRESS = pg_walinspect
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_walinspect
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql
new file mode 100644
index 00..bd641abd04
--- /dev/null
+++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql
@@ -0,0 +1,106 @@
+/* contrib/pg_walinspect/pg_walinspect--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_walinspect" to load this file. \quit
+
+--
+-- pg_get_raw_wal_record()
+--
+CREATE FUNCTION pg_get_raw_wal_record(IN in_lsn pg_lsn,
+OUT lsn pg_lsn,
+OUT record bytea
+)
+AS 'MODULE_PATHNAME', 'pg_get_raw_wal_record'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) TO pg_monitor;
+--
+-- pg_get_first_valid_wal_record_lsn()
+--
+CREATE FUNCTION pg_get_first_valid_wal_record_lsn(IN in_lsn pg_lsn,
+OUT lsn pg_lsn
+)
+AS 'MODULE_PATHNAME', 'pg_get_first_valid_wal_record_lsn'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_get_first_valid_wal_record_lsn(pg_lsn) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_first_valid_wal_record_lsn(pg_lsn) TO pg_monitor;
+
+--
+-- pg_verify_raw_wal_record()
+--
+CREATE FUNCTION pg_verify_raw_wal_record(IN record bytea,
+OUT is_valid boolean
+)
+AS 'MODULE_PATHNAME', 'pg_verify_raw_wal_record'
+LANGUAGE C STRICT PARALLEL SAFE;
+
+REVOKE EXECUTE ON FUNCTION pg_verify_raw_wal_record(bytea) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_verify_raw_wal_record(bytea) TO pg_monitor;
+
+--
+-- pg_get_wal_record_info()
+--
+CREATE FUNCTION pg_get_wal_record_info(IN in_lsn pg_lsn,
+OUT lsn pg_lsn,
+OUT prev_lsn pg_lsn,
+OUT xid xid,
+OUT rmgr text,
+OUT length int4,
+OUT total_length int4,
+	OUT description text,
+OUT block_ref text,
+OUT data bytea,
+OUT data_len

Re: XLogReadRecord() error in XlogReadTwoPhaseData()

2021-11-18 Thread Andrey Borodin




> 18 нояб. 2021 г., в 12:05, Noah Misch  написал(а):
> 
> What else might help?

Let's add more tests that check survival of 2PC through crash recovery? We do 
now only one restart. Maybe it worth to do 4 or 8?

Best regards, Andrey Borodin.

Re: row filtering for logical replication

On Thu, Nov 18, 2021 at 11:02 AM Peter Smith  wrote:
>
> On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila  wrote:
> >
> > 5. Why do you need a separate variable rowfilter_valid to indicate
> > whether a valid row filter exists? Why exprstate is not sufficient?
> > Can you update comments to indicate why we need this variable
> > separately?
>
> I have improved the (existing) comment in v40 [1].
>
> >
> > 0004*
> > 6. In rowfilter_expr_checker(), the expression tree is traversed
> > twice, can't we traverse it once to detect all non-allowed stuff? It
> > can be sometimes costly to traverse the tree multiple times especially
> > when the expression is complex and it doesn't seem acceptable to do so
> > unless there is some genuine reason for the same.
>
> I kind of doubt there would be any perceptible difference for 2
> traverses instead of 1 because:
> a) filters are limited to simple expressions. Yes, a large boolean
> expression is possible but I don't think it is likely.
>

But in such cases, it will be quite costly and more importantly, I
don't see any good reason why we need to traverse it twice..

> b) the validation part is mostly a one-time execution only when the
> filter is created or changed.
>
> Anyway, I am happy to try to refactor the logic to a single traversal
> as suggested, but I'd like to combine those "validation" patches
> (v40-0005, v40-0006) first, so I can combine their walker logic. Is it
> OK?
>

That should be okay. You can combine the logic of v40-0005 and
v40-0006, and then change it so that you need to traverse the
expression once.

-- 
With Regards,
Amit Kapila.

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2021-11-18 Thread Etsuro Fujita

On Thu, Nov 18, 2021 at 1:09 PM Fujii Masao  wrote:
> On 2021/11/16 18:55, Etsuro Fujita wrote:
> > Sorry, my explanation was not enough, but I don’t think this is always
> > true.  Let me explain using an example:
> >
> > create server loopback foreign data wrapper postgres_fdw options
> > (dbname 'postgres', parallel_commit 'true');
> > create user mapping for current_user server loopback;
> > create table t1 (a int, b int);
> > create table t2 (a int, b int);
> > create foreign table ft1 (a int, b int) server loopback options
> > (table_name 't1');
> > create foreign table ft2 (a int, b int) server loopback options
> > (table_name 't2');
> > create role view_owner superuser;
> > create user mapping for view_owner server loopback;
> > grant SELECT on ft1 to view_owner;
> > create view v1 as select * from ft1;
> > alter view v1 owner to view_owner;
> >
> > begin;
> > insert into v1 values (10, 10);
> > insert into ft2 values (20, 20);
> > commit;
> >
> > For this transaction, since the first insert is executed as the view
> > owner while the second insert is executed as the current user, we
> > create a connection to the foreign server for each of the users to
> > execute the inserts.  This leads to sending two commit commands to the
> > foreign server at the same time during pre-commit.
> >
> > To avoid spike loads on a remote server induced by such a workload, I
> > think it’s a good idea to have a server option to control whether this
> > is enabled,
>
> I understand your point. But even if the option is disabled (i.e.,
> commit command is sent to each foreign server in serial way),
> multiple queries still can run on the server concurrently and
> which may cause performance "spike". Other clients may open several
> sessions to the server and issue queries at the same time. Other
> sessions using postgres_fdw may send commit command at the same time.
> If we want to avoid that "spike", probably we need to decrease
> max_connections or use connection pooling, etc.

I think that what you are discussing here would be a related but
different issue, because the patch doesn't increase the number of
connections to the remote server that are needed for processing a
single transaction than before.

My concern about the patch is that in parallel-commit mode,
transactions like the above example might increase the remote server's
load at transaction end than before, while using the same number of
connections to the remote server as before, because multiple COMMIT
commands are sent to the remote server at the same time, not
sequentially as before.  The option could be used to avoid such a
spike load without changing any settings on the remote server if
necessary.  Also, the option could be added at no extra cost, so there
seems to me to be no reason to remove it.

Anyway, I'd like to hear the opinions of others.

Thanks!

Best regards,
Etsuro Fujita

RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-18 Thread kuroda.hay...@fujitsu.com

Dear Kato-san,

Thank you for your interest!

> > I also want you to review the postgres_fdw part,
> > but I think it should not be attached because cfbot cannot understand
> > such a dependency
> > and will throw build error. Do you know how to deal with them in this
> > case?
> 
> I don't know how to deal with them, but I hope you will attach the PoC,
> as it may be easier to review.

OK, I attached the PoC along with the dependent patches. Please see the zip 
file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially and
WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.

===
How to use
===

I'll explain how to use it briefly.

1. boot two postmaster processes. One is coordinator, and another is worker
2. set remote_servers_connection_check_interval to non-zero value at the 
coordinator
3. create tables to worker DB-cluster.
4. create foreign server, user mapping, and foreign table to coordinator.
5. connect to coordinator via psql.
6. open a transaction and access to foreing tables.
7. do "pg_ctl stop" command to woker DB-cluser.
8. execute some commands that does not access an foreign table.
9. Finally the following output will be get:

ERROR:  Postgres foreign server XXX might be down.

===
Example in some steps
===

3. at worker

```
postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | remote | table | hayato
(1 row)
```

4. at coordinator 

```
postgres=# select * from pg_foreign_server ;
  oid  | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | 
srvoptions  
---+-+--++-+++-
 16406 | remote  |   10 |  16402 | ||| 
{port=5433,dbname=postgres}
(1 row)

postgres=# select * from pg_user_mapping ;
  oid  | umuser | umserver |   umoptions   
---++--+---
 16407 | 10 |16406 | {user=hayato}
(1 row)

postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | local  | table | hayato
 public | remote | foreign table | hayato
(2 rows)
```

6-9. at coordinator

```
postgres=# begin;
BEGIN
postgres=*# select * from remote ;
 id 

  1
(1 row)

postgres=*# select * from local ;
ERROR:  Postgres foreign server remote might be down.
postgres=!#
```

Note that some keepalive settings are needed
if you want to detect cable breakdown events.
In my understanding following parameters are needed as server options:

* keepalives_idle
* keepalives_count
* keepalives_interval

[1]: https://commitfest.postgresql.org/35/3098/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v01_add_checking_infrastracture.patch
Description: v01_add_checking_infrastracture.patch

Re: Should rename "startup process" to something else?

2021-11-18 Thread Amul Sul

On Thu, Nov 18, 2021 at 6:06 PM Andrey Borodin  wrote:
>
>
>
> > 15 нояб. 2021 г., в 19:32, Rushabh Lathia  
> > написал(а):
> >
> > Open for suggestions and thoughts.
>
>
> How about walapplier ?
> Similar to walsender, walreciver..
>
Or maybe walreplayer ?

Regards,
Amul

Re: Should rename "startup process" to something else?

2021-11-18 Thread Andrey Borodin




> 15 нояб. 2021 г., в 19:32, Rushabh Lathia  
> написал(а):
> 
> Open for suggestions and thoughts.


How about walapplier ?
Similar to walsender, walreciver..

Best regards, Andrey Borodin.

Re: xlog.c: removing ReadRecPtr and EndRecPtr

2021-11-18 Thread Amul Sul

On Thu, Nov 18, 2021 at 3:31 AM Robert Haas  wrote:
>
> I spent a lot of time trying to figure out why xlog.c has global
> variables ReadRecPtr and EndRecPtr instead of just relying on the
> eponymous structure members inside the XLogReaderState. I concluded
> that the values are the same at most points in the code, and thus that
> we could just use xlogreaderstate->{Read,End}RecPtr instead. There are
> two places where this wouldn't produce the same results we're getting
> today. Both of those appear to be bugs.
>
> The reason why it's generally the case that ReadRecPtr ==
> xlogreaderstate->ReadRecPtr and likewise for EndRecPtr is that
> ReadRecord() calls XLogReadRecord(), which sets the values inside the
> XLogReaderState, and then immediately assigns the values stored there
> to the global variables. There's no other code that changes the other
> global variables, and the only other code that changes the structure
> members is XLogBeginRead(). So the values can be unequal from the time
> XLogBeginRead() is called up until the time that the XLogReadRecord()
> call inside ReadRecord() returns. In practice, StartupXLOG() always
> calls ReadRecord() right after it calls XLogBeginRead(), and
> ReadRecord() does not reference either global variable before calling
> XLogReadRecord(), so the problem surface is limited to code that runs
> underneath XLogReadRecord(). XLogReadRecord() is part of xlogreader.c,
> but it uses a callback interface: the callback is XLogPageRead(),
> which itself references EndRecPtr, and also calls
> WaitForWALToBecomeAvailable(), which in turn calls
> rescanLatestTimeLine(), which also references EndRecPtr. So these are
> the two problem cases: XLogPageRead(), and rescanLatestTimeLine().
>
> In rescanLatestTimeLine(), the problem is IMHO probably serious enough
> to justify a separate commit with back-patching. The problem is that
> EndRecPtr is being used here to reject impermissible attempts to
> switch to a bad timeline, but if pg_wal starts out empty, EndRecPtr
> will be 0 here, which causes the code to fail to detect a case that
> should be prohibited. Consider the following test case:
>
> - create a primary
> - create standby #1 from the primary
> - start standby #1 and promote it
> - take a backup from the primary using -Xnone to create standby #2
> - clear primary_conninfo on standby #2 and then start it
> - copy 0002.history from standby #1 to standby #2
>
> You get:
>
> 2021-11-17 15:34:26.213 EST [7474] LOG:  selected new timeline ID: 2
>
> But with the attached patch, you get:
>
> 2021-11-17 16:12:01.566 EST [20900] LOG:  new timeline 2 forked off
> current database system timeline 1 before current recovery point
> 0/A60
>

Somehow with and without patch I am getting the same log.

> Had the problem occurred at some later point in the WAL stream rather
> than before fetching the very first record, I think everything is
> fine; at that point, I think that the global variable EndRecPtr will
> be initialized. I'm not entirely sure that it contains exactly the
> right value, but it's someplace in the right ballpark, at least.
>

Agree, change seems pretty much reasonable.

> In XLogPageRead(), the problem is just cosmetic. We're only using
> EndRecPtr as an argument to emode_for_corrupt_record(), which is all
> about suppressing duplicate complaints about the same LSN. But if the
> xlogreader has been repositioned using XLogBeginRead() since the last
> call to ReadRecord(), or if there are no preceding calls to
> ReadRecord(), then the value of EndRecPtr here is left over from the
> previous read position and is not particularly related to the record
> we're reading now. xlogreader->EndRecPtr, OTOH, is. This doesn't seem
> worth a separate commit to me, or a back-patch, but it seems worth
> fixing while I'm cleaning up these global variables.
>
LGTM.

Regards,
Amul

Re: Skipping logical replication transactions on subscriber side

2021-11-18 Thread Masahiko Sawada

On Thu, Nov 18, 2021 at 5:45 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada  
> wrote:
> >
> > Right. I've fixed this issue and attached an updated patch.
> >
> >
>
> Thanks for your patch.
>
> I read the discussion about stats entries for table sync worker[1], the
> statistics are retained after table sync worker finished its jobs and user 
> can remove
> them via pg_stat_reset_subscription_worker function.
>
> But I notice that, if a table sync worker finished its jobs, the error 
> reported by
> this worker will not be shown in the pg_stat_subscription_workers view. (It 
> seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it 
> intentional? I think this may cause a result that users don't know the 
> statistics are still exist, and won't remove the statistics manually. And 
> that is not friendly to users' storage, right?
>

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Non-superuser subscription owners

On Thu, Nov 4, 2021 at 1:20 AM Mark Dilger  wrote:
>
> > On Nov 1, 2021, at 10:58 AM, Mark Dilger  
> > wrote:
> >
> > ALTER SUBSCRIPTION..[ENABLE | DISABLE] do not synchronously start or stop 
> > subscription workers.  The ALTER command updates the catalog's subenabled 
> > field, but workers only lazily respond to that.  Disabling and enabling the 
> > subscription as part of the OWNER TO would not reliably accomplish anything.
>
> I have rethought my prior analysis.  The problem in the previous patch was 
> that the subscription apply workers did not check for a change in ownership 
> the way they checked for other changes, instead only picking up the new 
> ownership information when the worker restarted for some other reason.  This 
> next patch set fixes that.  The application of a change record may continue 
> under the old ownership permissions when a concurrent command changes the 
> ownership of the subscription, but the worker will pick up the new 
> permissions before applying the next record.
>

Are you talking about the below change in the above paragraph?

@@ -2912,6 +2941,7 @@ maybe_reread_subscription(void)
  strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
  newsub->binary != MySubscription->binary ||
  newsub->stream != MySubscription->stream ||
+ newsub->owner != MySubscription->owner ||
  !equal(newsub->publications, MySubscription->publications))
  {

If so, I am not sure how it will ensure that we check the ownership
change before applying each change? I think this will be invoked at
each transaction boundary, so, if there is a transaction with a large
number of changes, all the changes will be processed under the
previous owner.

-- 
With Regards,
Amit Kapila.

Re: Failed transaction statistics to measure the logical replication progress

2021-11-18 Thread vignesh C

On Tue, Nov 16, 2021 at 6:04 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, November 15, 2021 9:14 PM I wrote:
> > I've conducted some update for this.
> > (The rebased part is only C code and checked by pgindent)
> I'll update my patches since a new skip xid patch
> has been shared in [1].
>
> This version includes some minor renames of functions
> that are related to transaction sizes.

Thanks for the updated patch, Few comments:
1) since pgstat_get_subworker_prepared_txn is called from only one
place and create is passed as true, we can remove create function
parameter or the function could be removed.
+ * Return subscription worker entry with the given subscription OID and
+ * gid.
+ * --
+ */
+static PgStat_SW_PreparedXactEntry *
+pgstat_get_subworker_prepared_txn(Oid databaseid, Oid subid,
+ char
*gid, bool create)
+{
+   PgStat_StatDBEntry *dbentry;
+   PgStat_SW_PreparedXactKey key;

 2) Include subworker prepared transactions also
  /*
* Don't create tables/functions/subworkers hashtables for
* uninteresting databases.
*/
if (onlydb != InvalidOid)
{
if (dbbuf.databaseid != onlydb &&
dbbuf.databaseid != InvalidOid)
break;
}

3) Similarly it should be mentioned in:
reset_dbentry_counters function header, pgstat_read_db_statsfile
function header and pgstat_get_db_entry function comments.

4) I felt we can remove "COMMIT of streaming transaction", since only
commit and commit prepared are the user operations. Shall we change it
to "COMMIT and COMMIT PREPARED will increment this counter."
+   commit_count bigint
+  
+  
+   Number of transactions successfully applied in this subscription.
+   COMMIT, COMMIT of streaming transaction and COMMIT PREPARED increments
+   this counter.
+  
+ 

5) PgStat_SW_PreparedXactEntry should be before PgStat_SW_PreparedXactKey
 PgStat_StatSubWorkerEntry
 PgStat_StatSubWorkerKey
+PgStat_SW_PreparedXactKey
+PgStat_SW_PreparedXactEntry
 PgStat_StatTabEntry
 PgStat_SubXactStatus

6)  This change is not required
@@ -293,6 +306,7 @@ static inline void cleanup_subxact_info(void);
 static void stream_cleanup_files(Oid subid, TransactionId xid);
 static void stream_open_file(Oid subid, TransactionId xid, bool first);
 static void stream_write_change(char action, StringInfo s);
+
 static void stream_close_file(void);

Regards,
Vignesh

Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?

On Thu, Nov 18, 2021 at 12:30 AM Euler Taveira  wrote:
>
> On Mon, Nov 15, 2021, at 4:27 AM, Bharath Rupireddy wrote:
>
> As there is some interest shown in this thread at [1], I'm attaching a
> new v3 patch here. Please review it.
>
> I took a look at this patch. I have a few comments.

Thanks a lot.

> s/shared-memory/shared memory/

I don't think we need to change that. This comment is picked up from
another AuxiliaryPidGetProc call from pgstatsfuncs.c and in the core
we have lots of instances of the term "shared-memory". I think we can
have it as is and let's not attempt to change it here in this thread
at least.

> syslogger and statistics collector don't have a procArray entry so you could
> probably provide a new function that checks if it is an auxiliary process.
> AuxiliaryPidGetProc() does not return all auxiliary processes; syslogger and
> statistics collector don't have a procArray entry. You can use their PIDs
> (SysLoggerPID and PgStatPID) to provide an accurate information.
>
> + if (proc)
> + ereport(WARNING,
> + (errmsg("signalling PostgreSQL server process with PID %d is not allowed",
>
> I would say "signal PostgreSQL auxiliary process PID 1234 is not allowed".

Although we have defined the term auxiliary process in the glossary
recently, I haven't found (on a quick look) any user facing log
messages using the term "auxiliary process". And if we just say "we
can't signal an auxiliary process", it doesn't look complete (we end
up hitting the other messages down for syslogger and stats collector).
Note that the AuxiliaryPidGetProc doesn't return a PGPROC entry for
syslogger and stats collector which according to the glossary are
auxiliary processes.

> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL server process", pid)));
>
> I would say "PID 1234 is not a PostgreSQL backend process". That's the 
> glossary
> terminology.

This looks okay as it along with the other new messages, says that the
calling function is allowed only to signal the backend process not the
postmaster or the other postgresql process (auxiliary process) or the
non-postgres processes.

The following is what I made up in my mind after looking at other
existing messages, like [1] and the review comments:
errmsg("cannot send signal to postmaster %d", pid,   --> the process
is postmaster but the caller isn't allowed to signal.
errmsg("cannot send signal to PostgreSQL server process %d", pid,
--> the process is a postgresql process but the caller isn't allowed
to signal.
errmsg("PID %d is not a PostgreSQL backend process", pid,  ---> it may
be another postgres processes like syslogger or stats collector or
non-postgres process but not a backend process.

Thoughts?

[1]
(errmsg("could not send signal to process %d: %m", pid)));
(errmsg("failed to send signal to postmaster: %m")));

Regards,
Bharath Rupireddy.

Re: Non-superuser subscription owners

On Wed, Nov 17, 2021 at 11:56 PM Mark Dilger
 wrote:
>
> > On Nov 17, 2021, at 9:33 AM, Jeff Davis  wrote:
> >
>
> > This would not address the weirdness of the existing code where a
> > superuser loses their superuser privileges but still owns a
> > subscription. But perhaps we can solve that a different way, like just
> > performing a check when someone loses their superuser privileges that
> > they don't own any subscriptions.
>
> I gave that a slight amount of thought during the design of this patch, but 
> didn't think we could refuse to revoke superuser on such a basis, and didn't 
> see what we should do with the subscription other than have it continue to be 
> owned by the recently-non-superuser.  If you have a better idea, we can 
> discuss it, but to some degree I think that is also orthogonal to the purpose 
> of this patch.  The only sense in which this patch depends on that issue is 
> that this patch proposes that non-superuser subscription owners are already 
> an issue, and therefore that this patch isn't creating a new issue, but 
> rather making more sane something that already can happen.
>

Don't we want to close this gap irrespective of the other part of the
feature? I mean if we take out the part of your 0003 patch that checks
whether the current user has permission to perform a particular
operation on the target table then the gap related to the owner losing
superuser privileges should be addressed.

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

On Thu, Nov 18, 2021 at 12:33 PM Peter Smith  wrote:
>
> PSA new set of v40* patches.
>

Thanks for the patch updates.

A couple of comments so far:

(1) compilation warning
WIth the patches applied, there's a single compilation warning when
Postgres is built:

pgoutput.c: In function ‘pgoutput_row_filter_init’:
pgoutput.c:854:8: warning: unused variable ‘relid’ [-Wunused-variable]
  Oid   relid = RelationGetRelid(relation);
^

> v40-0004 = combine using OR instead of AND
> - this is a new patch
> - new behavior. multiple filters now combine by OR instead of AND
> [Tomas 23/9] #3
>

(2) missing test case
It seems that the current tests are not testing the
multiple-row-filter case (n_filters > 1) in the following code in
pgoutput_row_filter_init():

rfnode = n_filters > 1 ? makeBoolExpr(OR_EXPR, rfnodes, -1) :
linitial(rfnodes);

I think a test needs to be added similar to the customers+countries
example that Tomas gave (where there is a single subscription to
multiple publications of the same table, each of which has a
row-filter).


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Should rename "startup process" to something else?

On Mon, Nov 15, 2021 at 8:03 PM Rushabh Lathia  wrote:
> Robert and I wondered whether we would like to rename the startup
>
> process.  The reason for doing this is that the current name doesn't
>
> make any sense, as in the stand-by mode replay loop as the main
>
> loop, the startup process stays around.

The startup process main function looks to be to do some
initialization required for recovery and do the recovery, exit if it
is a crash recovery or stay there if it is a standby recovery. Apart
from these I'm not sure it does any non-recovery tasks. Does the
startup process have any relevance or work to do if the server state
is DB_SHUTDOWNED i.e the server was shutdown properly? I think yes,
although it doesn't enter the recovery/REDO loop, it does a bunch of
other things like updating the control file, pre-allocating WAL files,
updating shared memory state to allow other backends to write WAL and
so on. By keeping this in mind, just renaming the startup process to
"recovery" or some other doesn't look good IMO. Instead, we can live
with it and be happy with the ps display update that we do in
XLogFileRead.

> I think a better name for the process may be “recovery” or “WAL_replay”
>
> or maybe someone has a better idea.  One can say that startup is a very
>
> generic name, so in the future, it gives flexibility in terms of assigning
>
> any sort of 'work’.  But in standby mode, the name “startup” doesn’t
>
> make sense.
>
> Open for suggestions and thoughts.

If at all, we were to change the startup process to "recovery" or some
other, it's not going to be a tiny change, we need to change the code
with some user facing error message, docs, testsm which isn't good
from code maintainability and especially the diff it creates with the
back branches that we support.

Others may have better thoughts.

Regards,
Bharath Rupireddy.

Re: Propose a new hook for mutating the query bounds

2021-11-18 Thread Xiaozhe Yao

Hi,

Thanks for the previous feedbacks!

> The way the hook is used seems pretty inconvenient, though.

I see the problem, and I agree.

I looked into how other hooks work, and I am wondering if it looks ok if
we: pass a pointer to the hook, and let the hook check if there is any
information applicable. If there is none, the hook just returns False and
we let the rest of the code handle. If it is true, we get the selectivity
from the hook and return it. So something like

```
if (clauselist_selectivity_hook &&
(*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo,
use_extended_stats, ))
{
return s1;
}
```

What I am trying to mock is the get_index_stats_hook (
https://github.com/taminomara/psql-hooks/blob/master/Detailed.md#get_index_stats_hook).


Am I understanding your idea correctly and does this look somehow better?

Best regards,
Xiaozhe

On Wed, Nov 17, 2021 at 7:47 PM Tomas Vondra 
wrote:

> On 11/17/21 16:39, Xiaozhe Yao wrote:
> > Hi Tom,
> >
> > Thanks for your feedback. I completely agree with you that a
> > higher-level hook is better suited for this case. I have adjusted the
> > PoC patch to this email.
> >
> > Now it is located in the clauselist_selectivity_ext function, where we
> > first check if the hook is defined. If so, we let the hook estimate the
> > selectivity and return the result. With this one, I can also develop
> > extensions to better estimate the selectivity.
> >
>
> I think clauselist_selectivity is the right level, because this is
> pretty similar to what extended statistics are doing. I'm not sure if
> the hook should be called in clauselist_selectivity_ext or in the plain
> clauselist_selectivity. But it should be in clauselist_selectivity_or
> too, probably.
>
> The way the hook is used seems pretty inconvenient, though. I mean, if
> you do this
>
>  if (clauselist_selectivity_hook)
>  return clauselist_selectivity_hook(...);
>
> then what will happen when the ML model has no information applicable to
> a query? This is called for all relations, all conditions, etc. and
> you've short-circuited all the regular code, so the hook will have to
> copy all of that. Seems pretty silly and fragile.
>
> IMO the right approach is what statext_clauselist_selectivity is doing,
> i.e. estimate clauses, mark them as estimated in a bitmap, and let the
> rest of the existing code take care of the remaining clauses. So more
> something like
>
>  if (clauselist_selectivity_hook)
>  s1 *= clauselist_selectivity_hook(..., );
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..94f7993529 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -26,6 +26,9 @@
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
+/* Hooks for plugins to get control when we ask for selectivity */
+clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL;
+
 /*
  * Data structure for accumulating info about possible range-query
  * clause pairs in clauselist_selectivity.
@@ -130,6 +133,18 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	ListCell   *l;
 	int			listidx;
 
+	/*
+	* If we have a hook for selectivity estimation and it returns True, then go for it.
+	*/
+	if (clauselist_selectivity_hook && 
+		(*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo, use_extended_stats, ))
+		{
+			/*
+			* The hook takes control of estimating the selectivity
+			* and saves the estimated selectivity to s1.
+			*/
+			return s1;
+		}
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selectivity_ext(). None of what we might do below is relevant.
@@ -370,7 +385,19 @@ clauselist_selectivity_or(PlannerInfo *root,
 	Bitmapset  *estimatedclauses = NULL;
 	ListCell   *lc;
 	int			listidx;
-
+	
+	/*
+	 * If we have a hook for selectivity estimation and it returns True, then go for it.
+	 */
+	if (clauselist_selectivity_hook && 
+		(*clauselist_selectivity_hook) (root, clauses, varRelid, jointype, sjinfo, use_extended_stats, ))
+		{
+			/*
+			* The hook takes control of estimating the selectivity
+			* and saves the estimated selectivity to s1.
+			*/
+			return s1;
+		}
 	/*
 	 * Determine if these clauses reference a single relation.  If so, and if
 	 * it has extended statistics, try to apply those.
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 41b49b2662..7e10144822 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -202,4 +202,17 @@ extern int	locate_var_of_level(Node *node, int levelsup);
 extern List *pull_var_clause(Node *node, int flags);
 extern Node *flatten_join_alias_vars(Query *query, Node *node);
 
+/* Hooks for external selectivity estimation */
+typedef bool (*clauselist_selectivity_hook_type) (
+

RE: Skipping logical replication transactions on subscriber side

2021-11-18 Thread tanghy.f...@fujitsu.com

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada  
wrote:
> 
> Right. I've fixed this issue and attached an updated patch.
> 
>

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can 
remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported 
by
this worker will not be shown in the pg_stat_subscription_workers view. (It 
seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? 
I think this may cause a result that users don't know the statistics are still 
exist, and won't remove the statistics manually. And that is not friendly to 
users' storage, right?

[1] 
https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com

Regards
Tang

Re: pg_waldump stucks with options --follow or -f and --stats or -z

On Thu, Nov 18, 2021 at 1:51 PM Michael Paquier  wrote:
>
> On Thu, Nov 18, 2021 at 01:32:08PM +0530, Bharath Rupireddy wrote:
> > Hm. So, the pg_waldump can have handlers for SIGINT, SIGTERM, SIGQUIT
> > and then it should emit the computed stats in those handlers the
> > comobinations - "-s/-f/-z" and "-s/-e/-f/-z". I'm okay with this
> > behaviour. Michael Paquier had suggested the same upthread. If okay, I
> > will work on that patch.
>
> While on it, I am pretty sure that we should add in the stats report
> the start and end LSNs matching with the reports.  If you use --follow
> and the call is interrupted, we would not really know to which part of
> the WAL stream a report applies to without this information, likely in
> the shape of an extra header.

Currently, the SIGINT, SIGTERM, SIGQUIT interruptions will cause the
pg_waldump process with --follow option to get killed. And it is a good
idea to specify the range of LSNs (start and end) upto which the stats are
computed. Maybe as another printf message at the end of the stats report,
after the "Total" section or at the beginning of the stats report.
Something like "summary of the statistics computed from << LSN >> to
<> are:", see [1].

[1] ubuntu3:~/postgres/src/bin/pg_waldump$ ./pg_waldump -p data -s
0/148 --stats
summary of the statistics computed from 0/148 and 0/158ABCD are:
Type   N  (%)  Record
size  (%) FPI size  (%)Combined size  (%)
   -  ---
 ---  ---   ----
   ---
XLOG  13 (  0.13)
1138 (  0.18) 8424 (  2.46) 9562 (  0.99)
Transaction   14 (  0.14)
1407 (  0.22)0 (  0.00) 1407 (  0.15)
Storage4 (  0.04)
 172 (  0.03)0 (  0.00)  172 (  0.02)
CLOG   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Database   1 (  0.01)
42 (  0.01)0 (  0.00)   42 (  0.00)
Tablespace 0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
MultiXact  0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
RelMap 0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Standby   37 (  0.36)
2318 (  0.37)0 (  0.00) 2318 (  0.24)
Heap2 74 (  0.73)
 12841 (  2.05)   121184 ( 35.46)   134025 ( 13.84)
Heap9005 ( 88.23)
532313 ( 84.91)84208 ( 24.64)   616521 ( 63.64)
Btree   1058 ( 10.37)
 76710 ( 12.24)   127932 ( 37.43)   204642 ( 21.13)
Hash   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Gin0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Gist   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Sequence   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
SPGist 0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
BRIN   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
CommitTs   0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
ReplicationOrigin  0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
Generic0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)
LogicalMessage 0 (  0.00)
 0 (  0.00)0 (  0.00)0 (  0.00)

     
Total  10206
 626941 [64.72%]   341748 [35.28%]   968689 [100%]

Regards,
Bharath Rupireddy.

Re: pg_waldump stucks with options --follow or -f and --stats or -z

On Thu, Nov 18, 2021 at 01:32:08PM +0530, Bharath Rupireddy wrote:
> Hm. So, the pg_waldump can have handlers for SIGINT, SIGTERM, SIGQUIT
> and then it should emit the computed stats in those handlers the
> comobinations - "-s/-f/-z" and "-s/-e/-f/-z". I'm okay with this
> behaviour. Michael Paquier had suggested the same upthread. If okay, I
> will work on that patch.

While on it, I am pretty sure that we should add in the stats report
the start and end LSNs matching with the reports.  If you use --follow
and the call is interrupted, we would not really know to which part of
the WAL stream a report applies to without this information, likely in
the shape of an extra header.
--
Michael


signature.asc
Description: PGP signature

Re: CREATE tab completion

2021-11-18 Thread Kyotaro Horiguchi

At Thu, 18 Nov 2021 17:17:25 +0900, Michael Paquier  wrote 
in 
> On Thu, Nov 18, 2021 at 04:25:30PM +0900, Ken Kato wrote:
> > For this part, I did the following:
> > +   else if (TailMatches("CREATE", "SEQUENCE", MatchAny, "AS") ||
> > +TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", 
> > MatchAny, "AS"))
> > +   COMPLETE_WITH("smallint", "integer", "bigint");
> > 
> > Am I doing this right? or Are there better ways to do it?
> 
> That looks fine per se.

FWIW, I would be a bit perplexed to see type names suggested in
upper-cases, even if it is acceptable by the parser.  If you type-in
the following phrase:

=# type boo

it is completed to "boolean" but,

=# type BOO

doesn't respond.

So we could use COMPLETE_WITH_CS instead so that CREATE SEQUENCE
behavess the same way with the existing behavior of TYPE.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: CREATE tab completion

On Thu, Nov 18, 2021 at 04:25:30PM +0900, Ken Kato wrote:
> For this part, I did the following:
> + else if (TailMatches("CREATE", "SEQUENCE", MatchAny, "AS") ||
> +  TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", 
> MatchAny, "AS"))
> + COMPLETE_WITH("smallint", "integer", "bigint");
> 
> Am I doing this right? or Are there better ways to do it?

That looks fine per se.

> +/* CREATE SCHEMA */
> + else if (Matches("CREATE", "SCHEMA"))
> + COMPLETE_WITH("AUTHORIZATION");
> + else if (Matches("CREATE", "SCHEMA") && TailMatches("AUTHORIZATION"))
> + COMPLETE_WITH_QUERY(Query_for_list_of_roles);

The part about CREATE SCHEMA was itching me a bit, until I recalled
this recent thread which has a more complete logic:
https://www.postgresql.org/message-id/87im0efqhp@wibble.ilmari.org

The rest looks good, I'll take care of that in a bit.
--
Michael


signature.asc
Description: PGP signature

Re: pg_waldump stucks with options --follow or -f and --stats or -z