Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-10-23 Thread Dian M Fay
On Sun Sep 5, 2021 at 6:43 PM EDT, Tom Lane wrote:
> "Dian M Fay"  writes:
> > [ 0001-Suppress-explicit-casts-of-text-constants-in-postgre.patch ]
>
> I took a quick look at this. The restriction to type text seems like
> very obviously a hack rather than something we actually want; wouldn't
> it mean we fail to act in a large fraction of the cases where we'd
> like to suppress the cast?
>
> A second problem is that I don't think the business with a
> const_showtype
> context field is safe at all. As you've implemented it here, it would
> affect the entire RHS tree, including constants far down inside complex
> expressions that have nothing to do with the top-level semantics.
> (I didn't look closely, but I wonder if the regression failure you
> mentioned is associated with that.)
>
> I think that we only want to suppress the cast in cases where
> (1) the constant is directly an operand of the operator we're
> expecting the remote parser to use its same-type heuristic for, and
> (2) the constant will be deparsed as a string literal. (If it's
> deparsed as a number, boolean, etc, then it won't be initially
> UNKNOWN, so that heuristic won't be applied.)
>
> Now point 1 means that we don't really need to mess with keeping
> state in the recursion context. If we've determined at the level
> of the OpExpr that we can do this, including checking that the
> RHS operand IsA(Const), then we can just invoke deparseConst() on
> it directly instead of recursing via deparseExpr().
>
> Meanwhile, I suspect that point 2 might be best checked within
> deparseConst() itself, as that contains both the decision and the
> mechanism about how the Const will be printed. So that suggests
> that we should invent a new showtype code telling deparseConst()
> to act this way, and then supply that code directly when we
> invoke deparseConst directly from deparseOpExpr.
>
> BTW, don't we also want to be able to optimize cases where the Const
> is on the LHS rather than the RHS?
>
> regards, tom lane

Thanks Tom, that makes way more sense! I've attached a new patch which
tests operands and makes sure one side is a Const before feeding it to
deparseConst with a new showtype code, -2. The one regression is gone,
but I've left a couple of test output discrepancies for now which
showcase lost casts on the following predicates:

* date(c5) = '1970-01-17'::date
* ctid = '(0,2)'::tid

These aren't exactly failures -- both implicit string comparisons work
just fine -- but I don't know Postgres well enough to be sure that
that's true more generally. I did try checking that the non-Const member
of the predicate is a Var; that left the date cast alone, since date(c5)
is a FuncExpr, but obviously can't do anything about the tid.

There's also an interesting case where `val::text LIKE 'foo'` works when
val is an enum column in the local table, and breaks, castless, with an
operator mismatch when it's altered to text: Postgres' statement parser
recognizes the cast as redundant and creates a Var node instead of a
RelabelType (as it will for, say, `val::varchar(10)`) before the FDW is
even in the picture. It's a little discomfiting, but I suppose a certain
level of "caveat emptor" entails when disregarding foreign types.

> (val as enum on local and remote)
> explain verbose select * from test where (val::text) like 'foo';
> 
>  Foreign Scan on public.test  (cost=100.00..169.06 rows=8 width=28)
>Output: id, val, on_day, ts, ts2
>Filter: ((test.val)::text ~~ 'foo'::text)
>Remote SQL: SELECT id, val, on_day, ts, ts2 FROM public.test
>
> (val as local text, remote enum)
> explain verbose select * from test where (val::text) like 'foo';
> 
>  Foreign Scan on public.test  (cost=100.00..122.90 rows=5 width=56)
>Output: id, val, on_day, ts, ts2
>Remote SQL: SELECT id, val, on_day, ts, ts2 FROM public.test WHERE ((val 
> ~~ 'foo'))
>
> explain verbose select * from test where (val::varchar(10)) like 'foo';
>
>  Foreign Scan on public.test  (cost=100.00..125.46 rows=5 width=56)
>Output: id, val, on_day, ts, ts2
>Remote SQL: SELECT id, val, on_day, ts, ts2 FROM public.test WHERE 
> ((val::character varying(10) ~~ 'foo'))

Outside that, deparseConst also contains a note about keeping the code
in sync with the parser (make_const in particular); from what I could
tell, I don't think there's anything in this that necessitates changes
there.
From 92ca385ff9891008e48d975a622f8b56b921 Mon Sep 17 00:00:00 2001
From: Dian M Fay 
Date: Sat, 23 Oct 2021 00:54:48 -0400
Subject: [PATCH] Suppress explicit casts of safe const operands

Adds a new showtype code -2 to deparseConst in order to skip casting for
string literals when deparseOpExpr determines that the remote server can
be relied on to match operand types.
---
 contrib/postgres_fdw/deparse.c|  36 -
 .../postgres_fdw/expected/postgres_fdw.out| 132 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  18 +++
 3 files changed, 146 insertions(+), 40 

Re: pg_receivewal starting position

2021-10-23 Thread Bharath Rupireddy
On Sun, Oct 24, 2021 at 4:40 AM Michael Paquier  wrote:
> On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> > 2) As I previously mentioned, we are not copying the slot contents
> > while holding the spinlock, it's just we are taking the memory address
> > and releasing the lock, so there is a chance that the memory we are
> > looking at can become unavailable or stale while we access
> > slot_contents. So, I suggest we do the memcpy of the *slot to
> > slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> > contents will be costlier, so let's just take the info that we need
> > (data.database, data.restart_lsn) into local variables while we hold
> > the spin lock
>
> The style of the patch is consistent with what we do in other areas
> (see pg_get_replication_slots as one example).
>
> > + /* Copy slot contents while holding spinlock */
> > + SpinLockAcquire(>mutex);
> > + slot_contents = *slot;
>
> And what this does is to copy the contents of the slot into a local
> area (note that we use a NameData pointing to an area with
> NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
> reason (this cannot change as of the LWLock acquired), what we have
> saved is unchanged as of this command's context.

pg_get_replication_slots holds the ReplicationSlotControlLock until
the end of the function so it can be assured that *slot contents will
not change. In ReadReplicationSlot, the ReplicationSlotControlLock is
released immediately after taking *slot pointer into slot_contents.
Isn't it better if we hold the lock until the end of the function so
that we can avoid the slot contents becoming stale problems?

Having said that, the ReplicationSlotCreate,
SearchNamedReplicationSlot (when need_lock is true) etc. release the
lock immediately. I'm not sure if I should ignore this staleness
problem in ReadReplicationSlot.

> > 3) The data that the new command returns to the client can actually
> > become stale while it is captured and in transit to the client as we
> > release the spinlock and other backends can drop or alter the info.
> > So, it's better we talk about this in the documentation of the new
> > command and also in the comments saying "clients will have to deal
> > with it."
>
> The same can be said with IDENTIFY_SYSTEM when the flushed location
> becomes irrelevant.  I am not sure that this has any need to apply
> here.  We could add that this is useful to get a streaming start
> position though.

I think that's okay, let's not make any changes or add any comments in
regards to the above. The client is basically bound to get the
snapshot of the data at the time it requests the database.

> > 4) How about we be more descriptive about the error added? This will
> > help identify for which replication slot the command has failed from
> > tons of server logs which really help in debugging and analysis.
> > I suggest we have this:
> > errmsg("cannot use \"%s\" command with logical replication slot
> > \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> > instead of just a plain, non-informative, generic message:
> > errmsg("cannot use \"%s\" with logical replication slots",
>
> Yeah.  I don't mind adding the slot name in this string.

Thanks.

Regards,
Bharath Rupireddy.




Re: MDAM techniques and Index Skip Scan patch

2021-10-23 Thread Dmitry Dolgov
> On Thu, Oct 21, 2021 at 07:16:00PM -0700, Peter Geoghegan wrote:
>
> My general concern is that the skip scan patch may currently be
> structured in a way that paints us into a corner, MDAM-wise.
>
> Note that the MDAM paper treats skipping a prefix of columns as a case
> where the prefix is handled by pretending that there is a clause that
> looks like this: "WHERE date between -inf AND +inf" -- which is not so
> different from the original sales SQL query example that I have
> highlighted. We don't tend to think of queries like this (like my
> sales query) as in any way related to skip-scan, because we don't
> imagine that there is any skipping going on. But maybe we should
> recognize the similarities.

To avoid potential problems with extensibility in this sense, the
implementation needs to explicitly work with sets of disjoint intervals
of values instead of simple prefix size, one set of intervals per scan
key. An interesting idea, doesn't seem to be a big change in terms of
the patch itself.




Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-23 Thread Michael Paquier
On Sat, Oct 23, 2021 at 08:42:29PM +, Bossart, Nathan wrote:
> Otherwise, presumably we will need to update func.sgml and the comment
> above pg_log_backend_memory_contexts() in mcxtfuncs.c.

Yes, the documentation of any SQL function whose hardcoded superuser()
check is removed needs a refresh to outline that its execution can be
GRANT-ed post-initialization, and it should also document which system
roles are able to use it.  See for instance pg_database_size(), that
mentions roles need to be a member of pg_read_all_stats.

> This is unrelated to this patch, but should we also consider opening
> up pg_reload_conf() and pg_rotate_logfile() to members of
> pg_signal_backend?  Those are the other "server signaling functions" I
> see in the docs.

Yes, there is that as well.

+CREATE ROLE testrole1 IN ROLE pg_signal_backend;
+CREATE ROLE testrole2;
Any role created in the regression test needs to be prefixed with
"regress_", or builds with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
will complain (I just add that by default to not fall into this trap
again).
--
Michael


signature.asc
Description: PGP signature


Re: Refactoring: join MakeSingleTupleTableSlot() and MakeTupleTableSlot()

2021-10-23 Thread Michael Paquier
On Fri, Oct 22, 2021 at 04:39:37PM +0300, Aleksander Alekseev wrote:
> I propose keeping only one of these procedures to simplify navigating
> through the code and debugging, and maybe saving a CPU cycle or two. A
> search for MakeTupleTableSlot produced 8 matches across 2 files, while
> MakeSingleTupleTableSlot is used 41 times across 26 files. Thus the
> proposed patch removes MakeTupleTableSlot and keeps
> MakeSingleTupleTableSlot to keep the patch less invasive and simplify
> backporting of the other patches. Hopefully, this will not complicate
> the life of the extension developers too much.

To make the life of extension developers easier, we could as well have
a compatibility macro so as anybody using MakeTupleTableSlot() won't
be annoyed by this change.  However, looking around, this does not
look like a popular API so I'd be fine with your change as proposed.

Other opinions?
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-10-23 Thread Michael Paquier
On Sat, Oct 23, 2021 at 10:46:30PM +0530, Bharath Rupireddy wrote:
> 1) It's better to initialize nulls with false, we can avoid setting
> them to true. The instances where the columns are not nulls is going
> to be more than the columns with null values, so we could avoid some
> of nulls[i] = false; instructions.

I don't think that this is an improvement in this case as in the
default case we'd return a tuple full of NULL values if the slot does
not exist, so the existing code is simpler when we don't look at the
slot contents.

> 2) As I previously mentioned, we are not copying the slot contents
> while holding the spinlock, it's just we are taking the memory address
> and releasing the lock, so there is a chance that the memory we are
> looking at can become unavailable or stale while we access
> slot_contents. So, I suggest we do the memcpy of the *slot to
> slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
> contents will be costlier, so let's just take the info that we need
> (data.database, data.restart_lsn) into local variables while we hold
> the spin lock

The style of the patch is consistent with what we do in other areas
(see pg_get_replication_slots as one example).

> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(>mutex);
> + slot_contents = *slot;

And what this does is to copy the contents of the slot into a local
area (note that we use a NameData pointing to an area with
NAMEDATALEN).  Aka if the contents of *slot are changed by whatever
reason (this cannot change as of the LWLock acquired), what we have
saved is unchanged as of this command's context.

> 3) The data that the new command returns to the client can actually
> become stale while it is captured and in transit to the client as we
> release the spinlock and other backends can drop or alter the info.
> So, it's better we talk about this in the documentation of the new
> command and also in the comments saying "clients will have to deal
> with it."

The same can be said with IDENTIFY_SYSTEM when the flushed location
becomes irrelevant.  I am not sure that this has any need to apply
here.  We could add that this is useful to get a streaming start
position though.

> 4) How about we be more descriptive about the error added? This will
> help identify for which replication slot the command has failed from
> tons of server logs which really help in debugging and analysis.
> I suggest we have this:
> errmsg("cannot use \"%s\" command with logical replication slot
> \"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
> instead of just a plain, non-informative, generic message:
> errmsg("cannot use \"%s\" with logical replication slots",

Yeah.  I don't mind adding the slot name in this string.
--
Michael


signature.asc
Description: PGP signature


Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-10-23 Thread Jeff Davis

Add new predefined role pg_maintenance, which can issue VACUUM,
ANALYZE, CHECKPOINT.

Patch attached.

Regards,
Jeff Davis

From e615ad4b4a8c390a4b937dbb5721de5daad5 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 23 Oct 2021 13:41:41 -0700
Subject: [PATCH] Predefined role pg_maintenance for VACUUM, ANALYZE,
 CHECKPOINT.

---
 doc/src/sgml/ref/analyze.sgml | 14 +++---
 doc/src/sgml/ref/checkpoint.sgml  |  3 ++-
 doc/src/sgml/ref/vacuum.sgml  | 14 +++---
 src/backend/commands/vacuum.c | 12 +++-
 src/backend/tcop/utility.c|  5 +++--
 src/include/catalog/catversion.h  |  2 +-
 src/include/catalog/pg_authid.dat |  5 +
 7 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c8fcebc1612..38e3d8916f0 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -148,13 +148,13 @@ ANALYZE [ VERBOSE ] [ table_and_columnsNotes
 
   
-   To analyze a table, one must ordinarily be the table's owner or a
-   superuser.  However, database owners are allowed to
-   analyze all tables in their databases, except shared catalogs.
-   (The restriction for shared catalogs means that a true database-wide
-   ANALYZE can only be performed by a superuser.)
-   ANALYZE will skip over any tables that the calling user
-   does not have permission to analyze.
+   To analyze a table, one must ordinarily be the table's owner, a superuser,
+   or a member of the pg_maintenance role.  However,
+   database owners are allowed to analyze all tables in their databases,
+   except shared catalogs.  (The restriction for shared catalogs means that a
+   true database-wide ANALYZE can only be performed by a
+   superuser.)  ANALYZE will skip over any tables that the
+   calling user does not have permission to analyze.
   
 
   
diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 2afee6d7b59..461ccd603e5 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -52,7 +52,8 @@ CHECKPOINT
   
 
   
-   Only superusers can call CHECKPOINT.
+   Only superusers or members of pg_maintenance can call
+   CHECKPOINT.
   
  
 
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 3df32b58ee6..72d5b56feb5 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -356,13 +356,13 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ relisshared))
 		return true;
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bf085aa93b2..43607eb84af 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -24,6 +24,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_authid.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/alter.h"
@@ -939,10 +940,10 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			break;
 
 		case T_CheckPointStmt:
-			if (!superuser())
+			if (!has_privs_of_role(GetUserId(), ROLE_PG_MAINTENANCE))
 ereport(ERROR,
 		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 errmsg("must be superuser to do CHECKPOINT")));
+		 errmsg("must be member of pg_maintenance to do CHECKPOINT")));
 
 			RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
 			  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 3253b8751b1..1bc331a9a39 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202109101
+#define CATALOG_VERSION_NO	202110231
 
 #endif
diff --git a/src/include/catalog/pg_authid.dat b/src/include/catalog/pg_authid.dat
index 3da68016b61..886969dc646 100644
--- a/src/include/catalog/pg_authid.dat
+++ b/src/include/catalog/pg_authid.dat
@@ -79,5 +79,10 @@
   rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
   rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
   rolpassword => '_null_', rolvaliduntil => '_null_' },
+{ oid => '4544', oid_symbol => 'ROLE_PG_MAINTENANCE',
+  rolname => 'pg_maintenance', rolsuper => 'f', rolinherit => 't',
+  rolcreaterole => 'f', rolcreatedb => 'f', rolcanlogin => 'f',
+  rolreplication => 'f', rolbypassrls => 'f', rolconnlimit => '-1',
+  rolpassword => '_null_', rolvaliduntil => '_null_' },
 
 ]
-- 
2.17.1



Re: Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-23 Thread Bossart, Nathan
On 10/23/21, 12:57 PM, "Jeff Davis"  wrote:
> pg_signal_backend seems like the appropriate predefined role, because
> pg_log_backend_memory_contexts() is implemented by a sending signal.

This seems reasonable to me.  The stated reason in the original commit
message for keeping it restricted to superusers is because of the
denial-of-service risk, but if you've got pg_signal_backend, you can
already terminate sessions.  The predefined roles documentation notes
that members of pg_signal_backend cannot signal superuser-owned
backends, but AFAICT pg_log_backend_memory_contexts() has no such
restriction at the moment.  Should we add this?

Otherwise, presumably we will need to update func.sgml and the comment
above pg_log_backend_memory_contexts() in mcxtfuncs.c.

This is unrelated to this patch, but should we also consider opening
up pg_reload_conf() and pg_rotate_logfile() to members of
pg_signal_backend?  Those are the other "server signaling functions" I
see in the docs.

Nathan



Re: How to retain lesser paths at add_path()?

2021-10-23 Thread Greg Stark
On Wed, 31 Jul 2019 at 11:45, Robert Haas  wrote:
>
> On Wed, Jul 31, 2019 at 11:07 AM Tom Lane  wrote:
> > What you'd want to do for something like the above, I think, is to
> > have some kind of figure of merit or other special marking for paths
> > that will have some possible special advantage in later planning
> > steps.  Then you can teach add_path that that's another dimension it
> > should consider, in the same way that paths with different sort orders
> > or parallizability attributes don't dominate each other.
>
> Yeah, but I have to admit that this whole design makes me kinda
> uncomfortable.  Every time somebody comes up with a new figure of
> merit, it increases not only the number of paths retained but also the
> cost of comparing two paths to possibly reject one of them.

But this is a fundamental problem with having lots of possible reasons
a path might be good. Not a problem with the algorithm.

I'm imagining that you're both right. If we had some sort of way to
look at the shape of the query and make decisions early on about what
figures of merit might be relevant then we might be able to pick just
a few. Sort of like how we currently only check paths that match some
join or other query feature.


-- 
greg




Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?gr

2021-10-23 Thread Jeff Davis
On Wed, 2021-10-13 at 11:43 +0530, Bharath Rupireddy wrote:
> Here comes the v2 patch. Note that I've retained superuser() check in
> the pg_log_backend_memory_contexts(). Please review it.

FYI: I submitted a separate patch here to allow pg_signal_backend to
execute pg_log_backend_memory_contexts():


https://www.postgresql.org/message-id/flat/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.ca...@j-davis.com

So we can consider that patch separately.

Regards,
Jeff Davis






Allow pg_signal_backend members to use pg_log_backend_memory_stats().

2021-10-23 Thread Jeff Davis

Simple patch to implement $SUBJECT attached.

pg_signal_backend seems like the appropriate predefined role, because
pg_log_backend_memory_contexts() is implemented by a sending signal.

Regards,
Jeff Davis

From 473e0bb2e1ae0d56dbc6b0262c16b10c6c0454cc Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Fri, 22 Oct 2021 13:40:35 -0700
Subject: [PATCH] Allow pg_signal_backend members to call
 pg_log_backend_memory_contexts().

---
 src/backend/catalog/system_functions.sql |  5 +
 src/backend/utils/adt/mcxtfuncs.c|  6 -
 src/include/catalog/catversion.h |  2 +-
 src/test/regress/expected/misc_functions.out | 23 ++--
 src/test/regress/sql/misc_functions.sql  | 15 +++--
 5 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index a416e94d371..2cd5e3fe17b 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -699,6 +699,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
+
 --
 -- We also set up some things as accessible to standard roles.
 --
@@ -713,6 +715,9 @@ GRANT EXECUTE ON FUNCTION pg_ls_tmpdir() TO pg_monitor;
 
 GRANT EXECUTE ON FUNCTION pg_ls_tmpdir(oid) TO pg_monitor;
 
+GRANT EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer)
+  TO pg_signal_backend;
+
 GRANT pg_read_all_settings TO pg_monitor;
 
 GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index 0d52613bc32..e06280ad712 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -177,12 +177,6 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 	int			pid = PG_GETARG_INT32(0);
 	PGPROC	   *proc;
 
-	/* Only allow superusers to log memory contexts. */
-	if (!superuser())
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("must be a superuser to log memory contexts")));
-
 	proc = BackendPidGetProc(pid);
 
 	/*
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 3253b8751b1..039f6338604 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202109101
+#define CATALOG_VERSION_NO	202110230
 
 #endif
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index e845042d38d..38e36fec43e 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -138,14 +138,33 @@ HINT:  No function matches the given name and argument types. You might need to
 --
 -- Memory contexts are logged and they are not returned to the function.
 -- Furthermore, their contents can vary depending on the timing. However,
--- we can at least verify that the code doesn't fail.
+-- we can at least verify that the code doesn't fail, and that the
+-- permissions are set properly.
 --
-SELECT * FROM pg_log_backend_memory_contexts(pg_backend_pid());
+SELECT pg_log_backend_memory_contexts(pg_backend_pid());
  pg_log_backend_memory_contexts 
 
  t
 (1 row)
 
+CREATE ROLE testrole1 IN ROLE pg_signal_backend;
+CREATE ROLE testrole2;
+SELECT has_function_privilege('testrole1',
+  'pg_log_backend_memory_contexts(integer)', 'EXECUTE'); -- yes
+ has_function_privilege 
+
+ t
+(1 row)
+
+SELECT has_function_privilege('testrole2',
+  'pg_log_backend_memory_contexts(integer)', 'EXECUTE'); -- no
+ has_function_privilege 
+
+ f
+(1 row)
+
+DROP ROLE testrole1;
+DROP ROLE testrole2;
 --
 -- Test some built-in SRFs
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index a398349afc6..abf8b33b11c 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -35,9 +35,20 @@ SELECT num_nulls();
 --
 -- Memory contexts are logged and they are not returned to the function.
 -- Furthermore, their contents can vary depending on the timing. However,
--- we can at least verify that the code doesn't fail.
+-- we can at least verify that the code doesn't fail, and that the
+-- permissions are set properly.
 --
-SELECT * FROM pg_log_backend_memory_contexts(pg_backend_pid());
+
+SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+
+CREATE ROLE testrole1 IN ROLE pg_signal_backend;
+CREATE ROLE testrole2;
+SELECT has_function_privilege('testrole1',
+  'pg_log_backend_memory_contexts(integer)', 'EXECUTE'); -- yes
+SELECT has_function_privilege('testrole2',
+  'pg_log_backend_memory_contexts(integer)', 'EXECUTE'); -- no
+DROP ROLE testrole1;
+DROP ROLE testrole2;
 
 --
 -- Test some built-in 

Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-23 Thread Mikhail
On Sun, Oct 17, 2021 at 10:52:38AM -0400, Tom Lane wrote:
> I am, however, concerned that this'll just trade off one hazard for
> another.  Instead of a risk of failing with ENOSPC (which the DBA
> can fix), we'll have a risk of kneecapping some other process at
> random (which the DBA can do nothing to prevent).

I tend to agree, and along with semas patch would like to suggest error
message improvement, it would have saved me about half a day of digging.
Tested on OpenBSD 7.0.

I'm not a native speaker though, so grammar need to be checked.

diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index 21c883ba9a..b84f70b5e2 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -133,7 +133,10 @@ InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int 
numSems)
 "respective kernel parameter.  
Alternatively, reduce PostgreSQL's "
 "consumption of semaphores by 
reducing its max_connections parameter.\n"
 "The PostgreSQL documentation 
contains more information about "
-"configuring your system for 
PostgreSQL.") : 0));
+"configuring your system for 
PostgreSQL.\n"
+"If server has crashed 
previously there may be resources left "
+"after it - take a look at 
ipcs(1) and ipcrm(1) man pages to see "
+"how to remove them.") : 0));
}
 
return semId;




Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"

2021-10-23 Thread Jeff Davis
On Sat, 2021-10-23 at 09:31 +0530, Bharath Rupireddy wrote:
> If you are suggesting ...

Your complaint seems to be coming from commit dc788668, so the most
direct answer would be to make that configurable to the old behavior,
not to invent a new timeout behavior.

If I understand correctly, you are doing PITR from an archive, right?
So would restore_command be a reasonable place for the timeout?

And can you provide some approximate numbers to help me understand
where the timeout would be helpful? E.g. you have W GB of WAL to
replay, and restore would take X minutes, but some WAL is missing so
you fail after X-Y minutes, but if you has timeout Z everything would
be great.

> I think pg_waldump can help here to do some exploratory analysis of
> the available WAL in the directory where the WAL files are present.
> Since it is an independent C program, it can run even when the server
> is down and also run on archive location.

Right, it's possible to do, but I think there's room for improvement so
we don't have to always scan the WAL. I'm getting a bit off-topic from
your proposal though. I'll bring it up in another thread when my
thoughts on this are more clear.

Regards,
Jeff Davis






Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir

2021-10-23 Thread Bharath Rupireddy
On Fri, Oct 22, 2021 at 9:26 PM Justin Pryzby  wrote:
> I concluded that it's better to add a function to list metadata of an 
> arbitrary
> dir, rather than adding more functions to handle specific, hardcoded dirs:
> https://www.postgresql.org/message-id/flat/20191227170220.ge12...@telsasoft.com

I just had a quick look at the pg_ls_dir_metadata() patch(I didn't
look at the other patches). While it's a good idea to have a single
function for all the PGDATA directories, I'm not sure if one would
ever need the info like type, change, creation path etc. If we do
this, the function will become the linux equivalent command. I don't
see the difference between modification and change time stamps. For
debugging or analytical purposes in production environments, one would
majorly look at the file name, it's size on the disk, modification
time (to decide whether the file is stale or not, creation time (to
decide how old is the file),  file/directory(maybe?). I'm not sure if
your patch has a recursive option for pg_ls_dir_metadata(), if it has,
I think it's more complex from a usability perspective.

And the functions like pg_ls_tmpdir, pg_ls_tmpdir, pg_ls_waldir etc.
(existing) and pg_ls_logicaldir, pg_ls_replslotdir (yet to have) will
provide the better usability compared to a generic function. Having
said this, I don't oppose having a generic function returning the file
name, file size, modification time, creation time, but not other info,
please. If one is interested in knowing the other information file
type, path etc. they can go run linux/windows/OS commands.

In summary what I think at this point is:
1) pg_ls_logicaldir, pg_ls_replslotdir - better for usability and
serving special purpose like their peers
2) modify pg_ls_dir such that it returns the file name, file size,
modification time, creation time, for directories, to be simple, it
shouldn't go recursively over all the directories, it should just
return the directory name, size, modification time, creation time.

If okay, I'm ready to spend time implementing them.

Thoughts?

Regards,
Bharath Rupireddy.




Re: pg_receivewal starting position

2021-10-23 Thread Bharath Rupireddy
On Sat, Oct 23, 2021 at 1:14 PM Michael Paquier  wrote:
>
> On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> > Done. I haven't touched the timeline switch test patch for now, but I still
> > include it here for completeness.
>
> Thanks.  I have applied and back-patched 0001, then looked again at
> 0002 that adds READ_REPLICATION_SLOT:
> - Change the TLI to use int8 rather than int4, so as we will always be
> right with TimelineID which is unsigned (this was discussed upthread
> but I got back on it after more thoughts, to avoid any future
> issues).
> - Added an extra initialization for the set of Datum values, just as
> an extra safety net.
> - There was a bug with the timeline returned when executing the
> command while in recovery as ThisTimeLineID is 0 in the context of a
> standby, but we need to support the case of physical slots even when
> streaming archives from a standby.  The fix is similar to what we do
> for IDENTIFY_SYSTEM, where we need to use the timeline currently
> replayed from GetXLogReplayRecPtr(), before looking at the past
> timeline history using restart_lsn and the replayed TLI.
>
> With that in place, I think that we are good now for this part.

Thanks for the updated patch. I have following comments on v10:

1) It's better to initialize nulls with false, we can avoid setting
them to true. The instances where the columns are not nulls is going
to be more than the columns with null values, so we could avoid some
of nulls[i] = false; instructions.
+ MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
I suggest we do the following. The number of instances of hitting the
"else" parts will be less.
MemSet(nulls, false, READ_REPLICATION_SLOT_COLS * sizeof(bool));

if (slot == NULL || !slot->in_use)
{
MemSet(nulls, true, READ_REPLICATION_SLOT_COLS * sizeof(bool));
LWLockRelease(ReplicationSlotControlLock);
}

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))
{
}
else
   nulls[i] = true;

2) As I previously mentioned, we are not copying the slot contents
while holding the spinlock, it's just we are taking the memory address
and releasing the lock, so there is a chance that the memory we are
looking at can become unavailable or stale while we access
slot_contents. So, I suggest we do the memcpy of the *slot to
slot_contents. I'm sure the memcpy ing the entire ReplicationSlot
contents will be costlier, so let's just take the info that we need
(data.database, data.restart_lsn) into local variables while we hold
the spin lock
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(>mutex);
+ slot_contents = *slot;
+ SpinLockRelease(>mutex);
+ LWLockRelease(ReplicationSlotControlLock);

The code will look like following:
Oiddatabase;
XLogRecPtr restart_lsn;

/* Take required information from slot contents while holding
spinlock */
SpinLockAcquire(>mutex);
database= slot->data.database;
restart_lsn= slot->data.restart_lsn;
SpinLockRelease(>mutex);
LWLockRelease(ReplicationSlotControlLock);

3) The data that the new command returns to the client can actually
become stale while it is captured and in transit to the client as we
release the spinlock and other backends can drop or alter the info.
So, it's better we talk about this in the documentation of the new
command and also in the comments saying "clients will have to deal
with it."

4) How about we be more descriptive about the error added? This will
help identify for which replication slot the command has failed from
tons of server logs which really help in debugging and analysis.
I suggest we have this:
errmsg("cannot use \"%s\" command with logical replication slot
\"%s\"", "READ_REPLICATION_SLOT", cmd->slotname);
instead of just a plain, non-informative, generic message:
errmsg("cannot use \"%s\" with logical replication slots",

Regards,
Bharath Rupireddy.




Re: XTS cipher mode for cluster file encryption

2021-10-23 Thread Bruce Momjian
On Tue, Oct 19, 2021 at 02:54:56PM -0400, Stephen Frost wrote:
> * Sasasu (i...@sasa.su) wrote:
> > A unified block-based I/O API sounds great. Has anyone tried to do this
> > before? It would be nice if the front-end tools could also use these API.
> 
> The TDE patch from Cybertec did go down this route, but the API ended up
> being rather different which menat a lot of changes in other parts of
> the system.  If we can get a block-based temporary file method that
> maintains more-or-less the same API, that'd be great, but I'm not sure
> that we can really do so and I am not entirely convinced that we should
> make the TDE effort depend on an otherwise quite independent effort of
> making all temp files usage be block based.

Uh, I thought people felt the Cybertec patch was too large and that a
unified API for temporary file I/O-encryption was a requirement.  Would
a CTR-steaming-encryption API for temporary tables be easier to
implement?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: XTS cipher mode for cluster file encryption

2021-10-23 Thread Bruce Momjian
On Mon, Oct 18, 2021 at 12:37:39PM -0400, Robert Haas wrote:
> I do really like the idea of using AES-GCM-SIV not because I know
> anything about it, but because the integrity checking seems cool, and
--
> storing the nonce seems like it would improve security. However, based

Frankly, I think we need to be cautious about doing anything related to
security for "cool" motivations.  (This might be how OpenSSL became such
a mess.)  For non-security features, you can often add a few lines of
code to enable some cool use-case.  For security features, you have to
block its targeted attack methods fully or it is useless.  (It doesn't
need to block all attack methods.)  To fully block attack methods,
security features must be thoroughly designed and all potential
interactions must be researched.

When adding non-security Postgres features, cool features can be more
easily implemented because they are built on the sold foundation of
Postgres.  For security features, you have to assume that attacks can
come from anywhere, so the foundation is unclear and caution is wise.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PATCH] Make ENOSPC not fatal in semaphore creation

2021-10-23 Thread Mikhail
On Fri, Oct 22, 2021 at 09:00:31PM -0400, Tom Lane wrote:
> I tried this on an OpenBSD 6.0 image I had handy.  The good news is
> that it works, and I can successfully start the postmaster with a lot
> of semaphores (I tried with max_connections=1) without any special
> system configuration.  The bad news is it's *slow*.  It takes the
> postmaster over a minute to start up at 1 max_connections, and
> also about 15 seconds to shut down.  The regression tests also appear
> noticeably slower, even at the default max_connections=100.  I'm
> afraid that those "lots of tiny mappings" that Thomas noted have
> a nasty impact on our process launch times, since the kernel
> presumably has to do work to clone them into the child process.
> 
> Now this lashup that I'm testing on is by no means well suited for
> performance tests, so maybe my numbers are bogus.  Also, maybe it's
> better in more recent OpenBSD releases.  But I think we need to take a
> harder look at performance before we decide that it's okay to change
> the default semaphore type for this platform.

I got following results for "time make installcheck" on a laptop with
OpenBSD 7.0 (amd64):

POSIX (max_connections=100) (default):  1m32.39s real 0m03.82s user 0m05.75s 
system
POSIX (max_connections=1):  2m13.11s real 0m03.56s user 0m07.06s 
system

SysV (max_connections=100) (default):   1m24.39s real 0m03.30s user 0m04.94s 
system
SysV (max_connections=1):   failed to start
after sysctl tunning:
SysV (max_connections=1):   1m47.51s real 0m03.78s user 0m05.61s 
system

I can confirm that start and stop of the server was slower in POSIX
case, but not terribly different (seconds, not a minute, as in your
case).

As the OpenBSD developers said - those who use OpenBSD are never after a
good performance, the system has a lot of bottlenecks except IPCs.

I see following reasons to switch from SysV to POSIX:

- consistency in the ports tree, all major ports use POSIX, it means
  better testing of the API
- as already pointed out - OpenBSD isn't about performance, and the
  results for default max_connections are pretty close
- crash recovery with the OS defaults is automatic and don't require DBA
  intervention and knowledge of ipcs and ipcrm
- higher density is available without system tuning

The disadvantage is in a worse performance for extreme cases, but I'm
not sure OpenBSD is used for them in production.




Re: XTS cipher mode for cluster file encryption

2021-10-23 Thread Bruce Momjian
On Tue, Oct 19, 2021 at 02:44:26PM -0400, Stephen Frost wrote:
> Another threat model to consider is if the attacker has read-only access
> to the data directory through, say, unix group read privileges or maybe
> the ability to monitor the traffic on the SAN, or the ability to
> read-only mount the LUN on to another system.  This might be obtained by
> attacking a backup process where the system was configured to run
> physical backups using an unprivileged OS user who only has group read
> access to the cluster (and the necessary but non-superuser privleges in
> the database system to start/stop the backup), or various potential
> attacks at the storage layer.  This is similar to the "data at rest"
> case above in that XTS works well to address this, but because the
> attacker would have ongoing access (rather than just one-time, such as
> in the first case), information such as which blocks are being changed
> inside of a given 8k page might be able to be determined and that could
> be useful information, though a point here: they would already be able
> to see clearly which 8k pages are being changed and which aren't, and
> there's not really any way for us to prevent that reasonably.  As such,
> I'd argue that using XTS is reasonable and we can mitigate some of this
> concern by using the LSN in the tweak instead of just the block number
> as the 'plain64' option in dmcrypt does.  That doing so would mean that

That is an excellent point, and something we should mention in our
documentation --- the fact that a change of 8k granularity will be
visible, and in certain specified cases, 16-byte change granularity will
also be visible.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [Proposal] Global temporary tables

2021-10-23 Thread Andrew Bille
Thanks, the vacuum is fixed

But I found another crash (on v57 patches), reproduced with:

psql -t -c "create global temp table t (a integer); insert into t values
(1); select count(*) from t group by t;"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

with trace:

[New LWP 2580215]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: andrew postgres [local] SELECT
 '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f258d482859 in __GI_abort () at abort.c:79
#2  0x55ad0be8878f in ExceptionalCondition
(conditionName=conditionName@entry=0x55ad0bf19743
"gtt_rnode->att_stat_tups[i]", errorType=errorType@entry=0x55ad0bee500b
"FailedAssertion", fileName=fileName@entry=0x55ad0bf1966b "storage_gtt.c",
lineNumber=lineNumber@entry=902) at assert.c:69
#3  0x55ad0ba9379f in get_gtt_att_statistic (reloid=,
attnum=0, inh=) at storage_gtt.c:902
#4  0x55ad0be35625 in examine_simple_variable
(root=root@entry=0x55ad0c498748,
var=var@entry=0x55ad0c498c68, vardata=vardata@entry=0x7fff06c9ebf0) at
selfuncs.c:5391
#5  0x55ad0be36a89 in examine_variable (root=root@entry=0x55ad0c498748,
node=node@entry=0x55ad0c498c68, varRelid=varRelid@entry=0,
vardata=vardata@entry=0x7fff06c9ebf0) at selfuncs.c:4990
#6  0x55ad0be3ad64 in estimate_num_groups (root=root@entry=0x55ad0c498748,
groupExprs=, input_rows=input_rows@entry=255,
pgset=pgset@entry=0x0, estinfo=estinfo@entry=0x0) at selfuncs.c:3455
#7  0x55ad0bc50835 in get_number_of_groups (root=root@entry=0x55ad0c498748,
path_rows=255, gd=gd@entry=0x0, target_list=0x55ad0c498bb8) at
planner.c:3241
#8  0x55ad0bc5576f in create_ordinary_grouping_paths
(root=root@entry=0x55ad0c498748,
input_rel=input_rel@entry=0x55ad0c3ce148,
grouped_rel=grouped_rel@entry=0x55ad0c4983f0,
agg_costs=agg_costs@entry=0x7fff06c9edb0, gd=gd@entry=0x0,
extra=extra@entry=0x7fff06c9ede0,
partially_grouped_rel_p=0x7fff06c9eda8)
at planner.c:3628
#9  0x55ad0bc55a72 in create_grouping_paths
(root=root@entry=0x55ad0c498748,
input_rel=input_rel@entry=0x55ad0c3ce148, target=target@entry=0x55ad0c4c95d8,
target_parallel_safe=target_parallel_safe@entry=true, gd=gd@entry=0x0) at
planner.c:3377
#10 0x55ad0bc5686d in grouping_planner (root=root@entry=0x55ad0c498748,
tuple_fraction=, tuple_fraction@entry=0) at planner.c:1592
#11 0x55ad0bc57910 in subquery_planner (glob=glob@entry=0x55ad0c497880,
parse=parse@entry=0x55ad0c3cdbb8, parent_root=parent_root@entry=0x0,
hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0)
at planner.c:1025
#12 0x55ad0bc57f36 in standard_planner (parse=0x55ad0c3cdbb8,
query_string=, cursorOptions=2048, boundParams=0x0) at
planner.c:406
#13 0x55ad0bc584d4 in planner (parse=parse@entry=0x55ad0c3cdbb8,
query_string=query_string@entry=0x55ad0c3cc470 "create global temp table t
(a integer); insert into t values (1); select count(*) from t group by t;",
cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at planner.c:277
#14 0x55ad0bd4855f in pg_plan_query
(querytree=querytree@entry=0x55ad0c3cdbb8,
query_string=query_string@entry=0x55ad0c3cc470 "create global temp table t
(a integer); insert into t values (1); select count(*) from t group by t;",
cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at postgres.c:847
#15 0x55ad0bd4863b in pg_plan_queries (querytrees=0x55ad0c4986f0,
query_string=query_string@entry=0x55ad0c3cc470 "create global temp table t
(a integer); insert into t values (1); select count(*) from t group by t;",
cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0)
at postgres.c:939
#16 0x55ad0bd48b20 in exec_simple_query
(query_string=query_string@entry=0x55ad0c3cc470
"create global temp table t (a integer); insert into t values (1); select
count(*) from t group by t;") at postgres.c:1133
#17 0x55ad0bd4aaf3 in PostgresMain (dbname=,
username=) at postgres.c:4497
#18 0x55ad0bca3f91 in BackendRun (port=port@entry=0x55ad0c3f1020) at
postmaster.c:4560
#19 0x55ad0bca7115 in BackendStartup (port=port@entry=0x55ad0c3f1020)
at postmaster.c:4288
#20 0x55ad0bca735c in ServerLoop () at postmaster.c:1801
#21 0x55ad0bca893e in PostmasterMain (argc=3, argv=) at
postmaster.c:1473
#22 0x55ad0bbe8e31 in main (argc=3, argv=0x55ad0c3c6660) at main.c:198

On Thu, Oct 21, 2021 at 4:25 PM wenjing  wrote:

>
>
> Andrew Bille  于2021年10月20日周三 上午2:59写道:
>
>> Another thanks for the fix. It works for me.
>>
>> But I found another crash!
>>
> This is a 

Re: pg_receivewal starting position

2021-10-23 Thread Michael Paquier
On Thu, Oct 21, 2021 at 10:36:42AM +0200, Ronan Dunklau wrote:
> Done. I haven't touched the timeline switch test patch for now, but I still 
> include it here for completeness.

Thanks.  I have applied and back-patched 0001, then looked again at
0002 that adds READ_REPLICATION_SLOT:
- Change the TLI to use int8 rather than int4, so as we will always be
right with TimelineID which is unsigned (this was discussed upthread
but I got back on it after more thoughts, to avoid any future
issues).
- Added an extra initialization for the set of Datum values, just as
an extra safety net.
- There was a bug with the timeline returned when executing the
command while in recovery as ThisTimeLineID is 0 in the context of a
standby, but we need to support the case of physical slots even when
streaming archives from a standby.  The fix is similar to what we do
for IDENTIFY_SYSTEM, where we need to use the timeline currently
replayed from GetXLogReplayRecPtr(), before looking at the past
timeline history using restart_lsn and the replayed TLI.

With that in place, I think that we are good now for this part.
--
Michael
From 7e20a294d18c280b8031ee6cc862ba6a661cf40f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 23 Oct 2021 16:28:10 +0900
Subject: [PATCH v10] Add replication command READ_REPLICATION_SLOT

The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.

This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM), if the archive directory is found as empty, which
would be an advantage in the case of switching to a different archive
location with the same slot used to avoid holes in what gets backed up.

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
---
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  11 ++
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 105 
 src/test/recovery/t/001_stream_rep.pl   |  32 +-
 src/test/recovery/t/006_logical_decoding.pl |  11 +-
 doc/src/sgml/protocol.sgml  |  48 +
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 223 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index e0057daa06..541e9861ba 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -496,6 +496,7 @@ typedef enum NodeTag
 	T_BaseBackupCmd,
 	T_CreateReplicationSlotCmd,
 	T_DropReplicationSlotCmd,
+	T_ReadReplicationSlotCmd,
 	T_StartReplicationCmd,
 	T_TimeLineHistoryCmd,
 	T_SQLCmd,
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a251f2..a746fafc12 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -87,6 +87,17 @@ typedef struct StartReplicationCmd
 } StartReplicationCmd;
 
 
+/* --
+ *		READ_REPLICATION_SLOT command
+ * --
+ */
+typedef struct ReadReplicationSlotCmd
+{
+	NodeTag		type;
+	char	   *slotname;
+} ReadReplicationSlotCmd;
+
+
 /* --
  *		TIMELINE_HISTORY command
  * --
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index 126380e2df..dcb1108579 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -64,6 +64,7 @@ static SQLCmd *make_sqlcmd(void);
 /* Keyword tokens. */
 %token K_BASE_BACKUP
 %token K_IDENTIFY_SYSTEM
+%token K_READ_REPLICATION_SLOT
 %token K_SHOW
 %token K_START_REPLICATION
 %token K_CREATE_REPLICATION_SLOT
@@ -94,7 +95,7 @@ static SQLCmd *make_sqlcmd(void);
 %type 	command
 %type 	base_backup start_replication start_logical_replication
 create_replication_slot drop_replication_slot identify_system
-timeline_history show sql_cmd
+read_replication_slot timeline_history show sql_cmd
 %type 	base_backup_legacy_opt_list generic_option_list
 %type 	base_backup_legacy_opt generic_option
 %type 	opt_timeline
@@ -125,6 +126,7 @@ command:
 			| start_logical_replication
 			| create_replication_slot
 			| drop_replication_slot
+			| read_replication_slot
 			| timeline_history
 			| show
 			| sql_cmd
@@ -140,6 +142,18 @@ identify_system:
 }
 			;
 
+/*
+ * READ_REPLICATION_SLOT %s
+ */
+read_replication_slot:
+			K_READ_REPLICATION_SLOT var_name
+{
+	ReadReplicationSlotCmd *n = makeNode(ReadReplicationSlotCmd);
+	n->slotname = $2;
+	$$ = (Node *) n;
+}
+			;
+
 /*
  * SHOW setting
  */
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index c038a636c3..1b599c255e