date:20220131

At Tue, 1 Feb 2022 10:08:04 +0530, Bharath Rupireddy 
 wrote in 
> On Tue, Feb 1, 2022 at 9:49 AM Fujii Masao  
> wrote:
> >
> > My previous comment was confusing... Probably I understand why you tried to 
> > put this information in checkpoint log message. But I was suggesting to put 
> > that information at the end of log message instead of the beginning of it. 
> > Because ordinary users would be less interested in this LSN information 
> > than other ones like the number of buffers written.
> 
> Actually, there's no strong reason to put LSN info at the beginning of
> the message except that LSN/REDO LSN next to the
> checkpoint/restartpoint complete would make the users understand the
> LSN and REDO LSN belong to the checkpoint/restartpoint. Since this
> wasn't a strong reason, I agree to keep it at the end.
> 
> Modified in v8.
> 
> [1]
> 2022-02-01 04:34:17.657 UTC [3597073] LOG:  checkpoint complete: wrote
> 21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=0.004 s, sync=0.008 s, total=0.031 s; sync files=18,
> longest=0.006 s, average=0.001 s; distance=77 kB, estimate=77 kB;
> lsn=0/14D5AF0, redo lsn=0/14D5AB8

0001 looks good to me.

I tend to agree to 0002.


FWIW, I collected other user-facing usage of "location" as LSN.

xlog.c:5965, 6128:
  (errmsg("recovery stopping before WAL location (LSN) \"%X/%X\"",

xlog.c:6718:
  (errmsg("control file contains invalid checkpoint location")));

xlog.c:6846:
  (errmsg("starting point-in-time recovery to WAL location (LSN) \"%X/%X\"",

xlog.c:6929:
  (errmsg("could not find redo location referenced by checkpoint record"),

xlog.c:11298, 11300:  (in backup-label)
  appendStringInfo(labelfile, "START WAL LOCATION: %X/%X (file %s)\n",
  appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n",
  (and corresponding reader-code)

xlog,c:11791, 11793:  (in backup history file)
  fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n",
  fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n",
  (and corresponding reader-code)

repl_scanner.l:151:
  yyerror("invalid streaming start location");

pg_proc.dat:
  many function descriptions use "location" as LSN.

pg_waldump.c:768,777,886,938,1029,1071,1083:
  printf(_("  -e, --end=RECPTR   stop reading at WAL location RECPTR\n"));
  printf(_("  -s, --start=RECPTR start reading at WAL location RECPTR\n"));
  pg_log_error("could not parse end WAL location \"%s\"",
  pg_log_error("could not parse start WAL location \"%s\"",
  pg_log_error("start WAL location %X/%X is not inside file \"%s\"",
  pg_log_error("end WAL location %X/%X is not inside file \"%s\"",
  pg_log_error("no start WAL location given");

pg_basebackup.c:476, 615: (confusing with file/directory path..)
  pg_log_error("could not parse write-ahead log location \"%s\"",
  pg_log_error("could not parse write-ahead log location \"%s\"",

pg_rewind.c:346:
  pg_log_info("servers diverged at WAL location %X/%X on timeline %u",
pg_rewind/timeline.c:82:
  pg_log_error("Expected a write-ahead log switchpoint location.");

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Design of pg_stat_subscription_workers vs pgstats

2022-01-31 Thread Masahiko Sawada

On Fri, Jan 28, 2022 at 2:59 PM Amit Kapila  wrote:
>
> On Fri, Jan 28, 2022 at 1:49 AM David G. Johnston
>  wrote:
> >
> > On Thu, Jan 27, 2022 at 5:08 AM Amit Kapila  wrote:
> >>
> >> On Thu, Jan 27, 2022 at 11:16 AM Andres Freund  wrote:
> >> >
> >> > On 2022-01-25 20:27:07 +0900, Masahiko Sawada wrote:
> >> >
> >> > > There will be some challenges in a case where updating 
> >> > > pg_subscription_rel
> >> > > also failed too (what to report to the user, etc.). And moreover, we 
> >> > > don't
> >> > > want to consume space for temporary information in the system catalog.
> >> >
> >> > You're consuming resources in a *WAY* worse way right now. The stats 
> >> > file gets
> >> > constantly written out, and quite often read back by backends. In 
> >> > contrast to
> >> > parts of pg_subscription_rel or such that data can't be removed from
> >> > shared_buffers under pressure.
> >> >
> >>
> >> I don't think pg_subscription_rel is the right place to store error
> >> info as the error can happen say while processing some message type
> >> like BEGIN where we can't map it to pg_subscription_rel entry. There
> >> could be other cases as well where we won't be able to map it to
> >> pg_subscription_rel like some error related to some other table while
> >> processing trigger function.
> >>
> >> In general, there doesn't appear to be much advantage in storing all
> >> the error info in system catalogs as we don't want it to be persistent
> >> (crash-safe). Also, this information is not about any system object
> >> that future operations can use, so won't map from that angle as well.
> >
> >
> > Repeating myself here to try and keep complaints regarding 
> > pg_stat_subscription_worker in one place.
> >
> > This is my specific email with respect to the pg_stat_scription_workers 
> > design.
> >
> > https://www.postgresql.org/message-id/CAKFQuwZbFuPSV1WLiNFuODst1sUZon2Qwbj8d9tT%3D38hMhJfvw%40mail.gmail.com
> >
> > Specifically,
> >
> > pg_stat_subscription_workers is defined as showing:
> > "will contain one row per subscription
> > worker on which errors have occurred, for workers applying logical
> > replication changes and workers handling the initial data copy of the
> > subscribed tables."
> >
> > The fact that these errors remain (last_error_*) even after they are no 
> > longer relevant is my main gripe regarding this feature.  The information 
> > itself is generally useful though last_error_count is not.  These fields 
> > should auto-clear and be named current_error_* as they exist for purposes 
> > of describing the current state of any error-encountering logical 
> > replication worker so that ALTER SUBSCRIPTION SKIP, or someone other manual 
> > intervention, can be done with that knowledge without having to scan the 
> > subscriber's server logs.
> >
>
> We can discuss names of columns but the main reason was that tomorrow
> say we want to account for total errors not only the current error
> then we have to introduce the field error_count or something like that
> which will then conflict with names like current_*. Similar for
> transaction abort_count. In the initial versions of the patch, we were
> not using last_* for column names but similar arguments led us to
> change names to last_ terminology and the same was being used in
> pg_stat_archiver. But, feel free to suggest better names. Yes, I agree
> with an auto-clear point as well and there seems to be an agreement
> for doing it after the next successful apply and or after we skipped
> the failed xact.
>
> > This is my email trying to understand reality better in order to figure out 
> > what exactly is causing the limitations that are negatively impacting the 
> > design of this feature.
> >
> > https://www.postgresql.org/message-id/CAKFQuwYJ7dsW%2BStsw5%2BZVoY3nwQ9j6pPt-7oYjGddH-h7uVb%2Bg%40mail.gmail.com
> >
> > In short, it was convenient to use the statistics collector here even if 
> > doing so resulted in a non-user friendly (IMO) design.  Given all of the 
> > limitations to the statistics collection infrastructure, and the fact that 
> > this data is not statistical in the usual usage of the term, I find that to 
> > be less than satisfying.
> >
>
> I think the failures/conflicts are also important information for
> users to know, so having a view of those doesn't appear to be a bad
> idea. All this data is less suitable for system catalogs like
> pg_subscription_rel or others for the reasons quoted in my previous
> email [1].

I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP
feature to pass error-XID or error-LSN information to the worker
whereas I'm also not sure of the advantages in storing all error
information in a system catalog. Since what we need to do for this
purpose is only error-XID/LSN, we can store only error-XID/LSN in the
catalog? That is, the worker stores error-XID/LSN in the catalog on an
error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip
the transaction in question. The

Re: row filtering for logical replication

2022-01-31 Thread Amit Kapila

On Tue, Feb 1, 2022 at 9:15 AM houzj.f...@fujitsu.com
 wrote:
>
> On Monday, January 31, 2022 9:02 PM Amit Kapila 
> >
>
> > 3.
> > + /* row filter (if any) */
> > + if (pset.sversion >= 15)
> > + {
> > + if (!PQgetisnull(result, i, 1))
> > + appendPQExpBuffer(, " WHERE %s", PQgetvalue(result, i, 1)); }
> >
> > I don't think we need this version check if while forming query we use NULL 
> > as
> > the second column in the corresponding query for v < 15.
>
> Changed.
>

But, I don't see a corresponding change in the else part of the query:
else
{
printfPQExpBuffer(,
  "SELECT pubname\n"
  "FROM pg_catalog.pg_publication p\n"
  "JOIN pg_catalog.pg_publication_rel pr ON p.oid = pr.prpubid\n"
  "WHERE pr.prrelid = '%s'\n"
  "UNION ALL\n"
  "SELECT pubname\n"
  "FROM pg_catalog.pg_publication p\n"
  "WHERE p.puballtables AND pg_catalog.pg_relation_is_publishable('%s')\n"
  "ORDER BY 1;",
  oid, oid);
}

Don't we need to do that to keep it working with previous versions?

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2022-01-31 Thread Greg Nancarrow

On Tue, Feb 1, 2022 at 2:45 PM houzj.f...@fujitsu.com
 wrote:
>
> Attach the V75 patch set which address the above, Amit's[1] and Greg's[2][3] 
> comments.
>

In the v74-0001 patch (and now in the v75-001 patch) a change was made
in the GetTopMostAncestorInPublication() function, to get the relation
and schema publications lists (for the ancestor Oid) up-front:

+ List*apubids = GetRelationPublications(ancestor);
+ List*aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
+
+ if (list_member_oid(apubids, puboid) ||
+list_member_oid(aschemaPubids, puboid))
+   topmost_relid = ancestor;

However, it seems that this makes it less efficient in the case a
match is found in the first list that is searched, since then there
was actually no reason to create the second list.
Instead of this, how about something like this:

List*apubids = GetRelationPublications(ancestor);
List*aschemaPubids = NULL;

if (list_member_oid(apubids, puboid) ||
   list_member_oid(aschemaPubids =
GetSchemaPublications(get_rel_namespace(ancestor)), puboid))
  topmost_relid = ancestor;

or, if that is considered a bit ugly due to the assignment within the
function parameters, alternatively:

List*apubids = GetRelationPublications(ancestor);
List*aschemaPubids = NULL;

if (list_member_oid(apubids, puboid))
   topmost_relid = ancestor;
else
{
   aschemaPubids = GetSchemaPublications(get_rel_namespace(ancestor));
   if (list_member_oid(aschemaPubids, puboid))
  topmost_relid = ancestor;
}

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-31 Thread Thomas Munro

On Tue, Jan 25, 2022 at 3:50 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > I vote for reverting in release branches only.  I'll propose a better
> > WES patch set for master that hopefully also covers async append etc
> > (which I was already planning to do before we knew about this Windows
> > problem).  More soon.
>
> WFM, but we'll have to remember to revert this in v15 if we don't
> have a solid fix by then.

Phew, after a couple of days of very slow compile/test cycles on
Windows exploring a couple of different ideas, I finally have
something new.  First let me recap the three main ideas in this
thread:

1.  It sounds like no one really loves the WSAPoll() kludge, even
though it apparently works for simple cases.  It's not totally clear
that it really works in enough cases, for one thing.  It doesn't allow
for a socket to be in two WESes at the same time, and I'm not sure I
want to bank on Winsock's WSAPoll() being guaranteed to report POLLHUP
when half closed (as mentioned, no other OS does AFAIK).

2.  The long-lived-WaitEventSets-everywhere concept was initially
appealling to me and solves the walreceiver problem (when combined
with a sticky seen_fd_close flag), and I've managed to get that
working correctly across libpq reconnects.  As mentioned, I also have
some toy patches along those lines for the equivalent but more complex
problem in postgres_fdw, because I've been studying how to make
parallel append generate a tidy stream of epoll_wait()/kevent() calls,
instead of a quadratic explosion of setup/teardown spam.  I'll write
some more about those patches and hopefully propose them soon anyway,
but on reflection I don't really want that Unix efficiency problem to
be tangled up with this Windows correctness problem.  That'd require a
programming rule that I don't want to burden us with forever: you'd
*never* be able to use a socket in more than one WaitEventSet, and
WaitLatchOrSocket() would have to be removed.

3.  The real solution to this problem is to recognise that we just
have the event objects in the wrong place.  WaitEventSets shouldn't
own them: they need to be 1:1 with sockets, or Winsock will eat
events.  Likewise for the flag you need for edge->level conversion, or
*we'll* eat events.  Having now tried that, it's starting to feel like
the best way forward, even though my initial prototype (see attached)
is maybe a tad cumbersome with bookkeeping.  I believe it means that
all existing coding patterns *should* now be safe (not yet confirmed
by testing), and we're free to put sockets in multiple WESes even at
the same time if the need arises.

The basic question is: how should a socket user find the associated
event handle and flags?  Some answers:

1.  "pgsocket" could become a pointer to a heap-allocated wrapper
object containing { socket, event, flags } on Windows, or something
like that, but that seems a bit invasive and tangled up with public
APIs like libpq, which put me off trying that.  I'm willing to explore
it if people object to my other idea.

2.  "pgsocket" could stay unchanged, but we could have a parallel
array with extra socket state, indexed by file descriptor.  We could
use new socket()/close() libpq events so that libpq's sockets could be
registered in this scheme without libpq itself having to know anything
about that.  That worked pretty nicely when I developed it on my
FreeBSD box, but on Windows I soon learned that SOCKET is really yet
another name for HANDLE, so it's not a dense number space anchored at
0 like Unix file descriptors.  The array could be prohibitively big.

3.  I tried the same as #2 but with a hash table, and ran into another
small problem when putting it all together: we probably don't want to
longjump out of libpq callbacks on allocation failure.  So, I modified
simplehash to add a no-OOM behaviour.  That's the POC patch set I'm
attaching for show-and-tell.  Some notes and TODOs in the commit
messages and comments.

Thoughts?
From bdd90aeb65d82ecae8fe58b441d25a1e1b129bf3 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 29 Jan 2022 02:15:10 +1300
Subject: [PATCH 1/3] Add low level socket events for libpq.

Provide a way to get a callback when a socket is created or closed.

XXX TODO handle callback failure
XXX TODO investigate overheads/other implications of having a callback
installed
---
 src/interfaces/libpq/fe-connect.c   | 24 
 src/interfaces/libpq/libpq-events.c | 11 +++
 src/interfaces/libpq/libpq-events.h | 10 +-
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index a6a1db3356..ddc3f38cf1 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -452,7 +452,19 @@ pqDropConnection(PGconn *conn, bool flushInput)

 	/* Close the socket itself */
 	if (conn->sock != PGINVALID_SOCKET)
+	{
+		/* Report that the socket is closing. */
+		for (int i = 0; i < conn->nEvents; i++)
+		{

Re: Add connection active, idle time to pg_stat_activity

At Mon, 31 Jan 2022 15:11:56 +0100, Sergey Dudoladov 
 wrote in 
> > >   if (beentry->st_state == STATE_RUNNING ||
> > >   beentry->st_state == STATE_FASTPATH)
> > > - pgstat_count_conn_active_time((PgStat_Counter) secs 
> > > * 100 + usecs);
> > > + {
> > > + pgstat_count_conn_active_time((PgStat_Counter) 
> > > usecs_diff);
> > > + beentry->st_total_active_time += usecs_diff;
> > > + }
> > >
> > > The two lines operates exactly the same way on variables with slightly
> > > different behavior. pgStatActiveTime is reported at transaction end
> > > and reset at every tabstat reporting. st_total_active_time is reported
> > > immediately and reset at session end. Since we do the latter, the
> > > first can be omitted by remembering the last values for the local
> > > variables at every reporting.  This needs additional two exporting
> >
> > Of course it's typo(?) of "values of the shared variables".
> 
> Could you please elaborate on this idea ?
> So we have pgStatActiveTime and pgStatIdleInTransactionTime ultimately
> used to report respective metrics in pg_stat_database.
> Now beentry's st_total_active_time / st_total_transaction_idle_time
> duplicates this info, so one may get rid of  pgStat*Time counters. Is
> the idea to report  instead of them at every tabstat reporting the
> difference between the last memorized value of  st_total_*_time and
> its current value ?

Exactly. The attached first diff is the schetch of that.

> > > This needs additional two exporting
> > > function in pgstatfuncs like pgstat_get_my_queryid so others might
> > > think differently.
> 
> What would be example functions to look at ?

pgstat_get_my_queryid..


And, it seems like I forgot to mention this, but as Kuntal suggested
(in a different context and objective, though) upthraed, I think that
we can show realtime values in the two time fields by adding the time
of the current state.  See the attached second diff.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0646f53098..27419c1851 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -249,8 +249,8 @@ static int  pgStatXactRollback = 0;
 PgStat_Counter pgStatBlockReadTime = 0;
 PgStat_Counter pgStatBlockWriteTime = 0;
 static PgStat_Counter pgLastSessionReportTime = 0;
-PgStat_Counter pgStatActiveTime = 0;
-PgStat_Counter pgStatTransactionIdleTime = 0;
+PgStat_Counter pgStatLastActiveTime = 0;
+PgStat_Counter pgStatLastTransactionIdleTime = 0;
 SessionEndType pgStatSessionEndCause = DISCONNECT_NORMAL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
@@ -1026,8 +1026,13 @@ pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, 
TimestampTz now)
TimestampDifference(pgLastSessionReportTime, now, 
, );
pgLastSessionReportTime = now;
tsmsg->m_session_time = (PgStat_Counter) secs * 100 
+ usecs;
-   tsmsg->m_active_time = pgStatActiveTime;
-   tsmsg->m_idle_in_xact_time = pgStatTransactionIdleTime;
+
+   /* send the difference since the last report */
+   tsmsg->m_active_time =
+   pgstat_get_my_active_time() - 
pgStatLastActiveTime;
+   tsmsg->m_idle_in_xact_time =
+   pgstat_get_my_transaction_idle_time() -
+   pgStatLastTransactionIdleTime;
}
else
{
@@ -1039,8 +1044,8 @@ pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz 
now)
pgStatXactRollback = 0;
pgStatBlockReadTime = 0;
pgStatBlockWriteTime = 0;
-   pgStatActiveTime = 0;
-   pgStatTransactionIdleTime = 0;
+   pgStatLastActiveTime = pgstat_get_my_active_time();
+   pgStatLastTransactionIdleTime =  
pgstat_get_my_transaction_idle_time();
}
else
{
diff --git a/src/backend/utils/activity/backend_status.c 
b/src/backend/utils/activity/backend_status.c
index 5f15dcdc05..8b6836a662 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -613,15 +613,9 @@ pgstat_report_activity(BackendState state, const char 
*cmd_str)
 */
if (beentry->st_state == STATE_RUNNING ||
beentry->st_state == STATE_FASTPATH)
-   {
-   pgstat_count_conn_active_time((PgStat_Counter) 
usecs_diff);
active_time_diff = usecs_diff;
-   }
else
-   {
-   pgstat_count_conn_txn_idle_time((PgStat_Counter) 
usecs_diff);

Re: Extensible Rmgr for Table AMs

Hi,

On Mon, Jan 31, 2022 at 06:36:59PM -0800, Jeff Davis wrote:
> On Tue, 2022-01-18 at 17:53 +0800, Julien Rouhaud wrote:
> 
> > There's a bit of 0003 present in 002:
> 
> I just squashed 0002 and 0003 together. Not large enough to keep
> separate.

Agreed.

> > +   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
> > +rmgr->rm_name, rmid);
> > 
> > Should it be a DEBUG message instead?  Also s/customer/custom/
> 
> It seems like a fairly important thing to have in the log. Only a
> couple extensions will ever encounter this message, and only at server
> start.

Ok.

> > +RmgrData
> > +GetCustomRmgr(RmgrId rmid)
> > +{
> > +   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
> > +   elog(PANIC, "custom rmgr id %d out of range", rmid);
> > 
> > Should this be an assert?
> 
> If we make it an Assert, then it won't be caught in production builds.

Sure, but it seems something so fundamental that it should get done right
during the development phase rather than spending cycles in optimized builds to
check for it.
But even if that happened it would get caught by the final "not found" PANIC
anyway, and for end users I don't think that getting this error instead would
make much difference.

> > How you plan to mark it experimental?
> 
> I suppose it doesn't need to be marked explicitly -- there are other
> APIs that change. For instance, the ProcessUtility_hook changed, and
> that's used much more widely.
> As long as we generally agree that some kind of custom rmgrs are the
> way to go, if we change the API or implementation around from version
> to version I can easily update my table access method.

Agreed, API breaks happen often and that's not really a problem for third-party
extensions, especially if that comes with hard compile failure.

Other than that the patch looks good to me, as you said we just need a decision
on whether custom rmgrs are wanted or not.

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Tue, Feb 1, 2022 at 9:49 AM Fujii Masao  wrote:
>
> My previous comment was confusing... Probably I understand why you tried to 
> put this information in checkpoint log message. But I was suggesting to put 
> that information at the end of log message instead of the beginning of it. 
> Because ordinary users would be less interested in this LSN information than 
> other ones like the number of buffers written.

Actually, there's no strong reason to put LSN info at the beginning of
the message except that LSN/REDO LSN next to the
checkpoint/restartpoint complete would make the users understand the
LSN and REDO LSN belong to the checkpoint/restartpoint. Since this
wasn't a strong reason, I agree to keep it at the end.

Modified in v8.

[1]
2022-02-01 04:34:17.657 UTC [3597073] LOG:  checkpoint complete: wrote
21 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.004 s, sync=0.008 s, total=0.031 s; sync files=18,
longest=0.006 s, average=0.001 s; distance=77 kB, estimate=77 kB;
lsn=0/14D5AF0, redo lsn=0/14D5AB8

Regards,
Bharath Rupireddy.


v8-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch
Description: Binary data


v8-0002-Change-location-to-lsn-in-pg_controldata.patch
Description: Binary data

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-31 Thread kuroda.hay...@fujitsu.com

Dear Fujii-san,

Thank you for reviewing! I attached the latest version.

> When more than one FDWs are used, even if one FDW calls this function to
> disable the timeout, its remote-server-check-callback can still be called. Is 
> this
> OK?
> Please imagine the case where two FDWs are used and they registered their
> callbacks. Even when one FDW calls TryDisableRemoteServerCheckingTimeout(),
> if another FDW has not called that yet, the timeout is still being enabled. 
> If the
> timeout is triggered during that period, even the callback registered by the 
> FDW
> that has already called TryDisableRemoteServerCheckingTimeout() would be
> called.

Indeed and it should be avoided. I added a counter to 
CheckingRemoteServersCallbackItem.
The register function returns the registered item, and it must be set as the 
argument for
enable and disable functions.
Callback functions will be called when item->counter is larger than zero.

> + if (remote_servers_connection_check_interval > 0)
> + {
> + CallCheckingRemoteServersCallbacks();
> +
>   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
> remote_servers_connection_check_interval);
> 
> LockErrorCleanup() needs to be called before reporting the error, doesn't it?

You are right. I think this suggests that error-reporting should be done in the 
core layer.
For meaningful error reporting, I changed a callback specification
that it should return ForeignServer* which points to downed remote server.

> This can cause an error even while DoingCommandRead == true. Is that safe?

I read codes again and I think it is not safe. It is OK when whereToSendOutput 
is DestRemote,
but not good in InteractiveBackend().

I changed that if-statement for CheckingRemoteServersTimeoutPending is moved 
just after
ClientConnectionLost, because the that may throw a FATAL error and disconnect 
from client.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v07_0001_add_checking_infrastracture.patch
Description: v07_0001_add_checking_infrastracture.patch


v07_0002_add_doc.patch
Description: v07_0002_add_doc.patch

Add DBState to pg_control_system function

Hi,

I think emitting DBState (staring up, shut down, shut down in
recovery, shutting down, in crash recovery, in archive recovery, in
production) via the pg_control_system function would help know the
database state, especially during PITR/archive recovery. During
archive recovery, the server can open up for read-only connections
even before the archive recovery finishes. Having, pg_control_system
emit database state would help the users/service layers know it and so
they can take some actions based on it.

Attaching a patch herewith.

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-Add-DBState-to-pg_control_system-function.patch
Description: Binary data

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-01-31 Thread Fujii Masao





On 2022/02/01 13:01, Bharath Rupireddy wrote:

On Tue, Feb 1, 2022 at 9:10 AM Fujii Masao  wrote:

The order of arguments for LSN seems wrong. 
LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of 
LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?


Thanks. Corrected.


Thanks!


Could you tell me why the information for LSN is reported earlierly in the log 
message? Since ordinally users would be more interested in the information 
about I/O by checkpoint, the information for LSN should be placed later? Sorry 
if this was already discussed.


It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN(end) and REDO LSN(start). It gives more context
while analyzing checkpoint-related issues. The pg_controldata gives
the last checkpoint LSN and REDO LSN, but having this info alongside
the log message helps analyze issues that happened previously, connect
the dots and identify the root cause.


My previous comment was confusing... Probably I understand why you tried to put 
this information in checkpoint log message. But I was suggesting to put that 
information at the end of log message instead of the beginning of it. Because 
ordinary users would be less interested in this LSN information than other ones 
like the number of buffers written.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Tue, Feb 1, 2022 at 9:10 AM Fujii Masao  wrote:
> The order of arguments for LSN seems wrong. 
> LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of 
> LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?

Thanks. Corrected.

> Could you tell me why the information for LSN is reported earlierly in the 
> log message? Since ordinally users would be more interested in the 
> information about I/O by checkpoint, the information for LSN should be placed 
> later? Sorry if this was already discussed.

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN(end) and REDO LSN(start). It gives more context
while analyzing checkpoint-related issues. The pg_controldata gives
the last checkpoint LSN and REDO LSN, but having this info alongside
the log message helps analyze issues that happened previously, connect
the dots and identify the root cause.

Regards,
Bharath Rupireddy.

v7-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch
Description: Binary data

v7-0002-Change-location-to-lsn-in-pg_controldata.patch
Description: Binary data

RE: row filtering for logical replication

2022-01-31 Thread houzj.f...@fujitsu.com

> On Saturday, January 29, 2022 8:31 AM Andres Freund 
> wrote:
> >
> > Hi,
> >
> > Are there any recent performance evaluations of the overhead of row
> > filters? I think it'd be good to get some numbers comparing:
> 
> Thanks for looking at the patch! Will test it.
> 
> > >   case REORDER_BUFFER_CHANGE_INSERT:
> > >   {
> > > - HeapTuple   tuple = 
> > > >data.tp.newtuple->tuple;
> > > + /*
> > > +  * Schema should be sent before the logic that 
> > > replaces the
> > > +  * relation because it also sends the 
> > > ancestor's relation.
> > > +  */
> > > + maybe_send_schema(ctx, change, relation, 
> > > relentry);
> > > +
> > > + new_slot = relentry->new_slot;
> > > +
> > > + ExecClearTuple(new_slot);
> > > + 
> > > ExecStoreHeapTuple(>data.tp.newtuple->tuple,
> > > +new_slot, 
> > > false);
> >
> > Why? This isn't free, and you're doing it unconditionally. I'd bet this 
> > alone is
> > noticeable slowdown over the current state.
>
> It was intended to avoid deform the tuple twice, once in row filter execution
> ,second time in logicalrep_write_tuple. But I will test the performance
> impact of this and improve this if needed.

I removed the unnecessary ExecClearTuple here, I think the ExecStoreHeapTuple
here doesn't allocate or free any memory and seems doesn't have a noticeable
impact from the perf result[1]. And we need this to avoid deforming the tuple
twice. So, it looks acceptable to me.

[1] 0.01% 0.00%  postgres  pgoutput.so [.] ExecStoreHeapTuple@plt

Best regards,
Hou zj

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-01-31 Thread Fujii Masao





On 2022/02/01 10:44, Nathan Bossart wrote:

On Tue, Feb 01, 2022 at 12:23:10AM +0530, Bharath Rupireddy wrote:

On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart  wrote:

I think the pg_controldata change needs some extra spaces for alignment,
but otherwise these patches seem reasonable to me.


Thanks. My bad it was. Changed in v6.


-   (errmsg("restartpoint complete: wrote %d buffers 
(%.1f%%); "
+   (errmsg("restartpoint complete: lsn=%X/%X, redo 
lsn=%X/%X; "
+   "wrote %d buffers (%.1f%%); "
"%d WAL file(s) added, %d removed, 
%d recycled; "
"write=%ld.%03d s, sync=%ld.%03d s, 
total=%ld.%03d s; "
"sync files=%d, longest=%ld.%03d s, 
average=%ld.%03d s; "
"distance=%d kB, estimate=%d 
kB",
+   
LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+   
LSN_FORMAT_ARGS(ControlFile->checkPoint),

The order of arguments for LSN seems wrong. 
LSN_FORMAT_ARGS(ControlFile->checkPoint) should be specified ahead of 
LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo)?

Could you tell me why the information for LSN is reported earlierly in the log 
message? Since ordinally users would be more interested in the information 
about I/O by checkpoint, the information for LSN should be placed later? Sorry 
if this was already discussed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Deparsing rewritten query

Hi,

On Mon, Jan 31, 2022 at 10:05:44PM +0100, Pavel Stehule wrote:
> 
> I don't feel good about forcing an alias. relname doesn't ensure
> uniqueness. You can have two views with the same name from different
> schemas. Moreover this field is necessary only when a deparsed query is
> printed, not always.

Yes I agree.

> Isn't possible to compute the correct subquery alias in print time when it
> is missing?

Actually I think that the current code already does everything to generate
unique refnames, it's just that they don't get printed for a query after view
expansions.  I modified the patch to simply make sure that an alias is
displayed when it's a subquery and the output using a custom pg_get_query_def
is like that:

# select  pg_get_query_def('select * from nsp1.v1');
   pg_get_query_def
---
  SELECT nb   +
FROM ( SELECT 1 AS nb) v1;+

(1 row)


# select  pg_get_query_def('select * from nsp1.v1, nsp2.v1');
   pg_get_query_def
---
  SELECT v1.nb,   +
 v1_1.nb  +
FROM ( SELECT 1 AS nb) v1,+
 ( SELECT 1 AS nb) v1_1;  +

(1 row)
>From 093cc350c09b3e0a458822da7f541ca602af4ef6 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 27 Jun 2021 11:39:47 +0800
Subject: [PATCH v3] Add pg_get_query_def() to deparse and print a rewritten
 SQL statement.

---
 src/backend/utils/adt/ruleutils.c   | 75 +
 src/include/catalog/pg_proc.dat |  3 ++
 src/test/regress/expected/rules.out | 26 ++
 src/test/regress/sql/rules.sql  |  3 ++
 4 files changed, 107 insertions(+)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 039b1d2b95..1186438757 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -49,6 +49,8 @@
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
 #include "optimizer/optimizer.h"
+#include "parser/analyze.h"
+#include "parser/parse_node.h"
 #include "parser/parse_agg.h"
 #include "parser/parse_func.h"
 #include "parser/parse_node.h"
@@ -58,6 +60,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "rewrite/rewriteSupport.h"
+#include "tcop/tcopprot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -493,6 +496,68 @@ static void get_reloptions(StringInfo buf, Datum 
reloptions);
 
 #define only_marker(rte)  ((rte)->inh ? "" : "ONLY ")
 
+/* return the query as postgres will rewrite */
+Datum
+pg_get_query_def(PG_FUNCTION_ARGS)
+{
+   char   *sql = TextDatumGetCString(PG_GETARG_TEXT_PP(0));
+   List   *parsetree_list;
+   List   *querytree_list;
+   RawStmt*parsetree;
+   Query   *query;
+   boolsnapshot_set = false;
+   StringInfoData  buf;
+   StringInfoData  res;
+   ListCell   *lc;
+
+   parsetree_list = pg_parse_query(sql);
+
+   /* only support one statement at a time */
+   if (list_length(parsetree_list) != 1)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("a single statement should be 
provided")));
+
+   initStringInfo();
+
+   parsetree = linitial_node(RawStmt, parsetree_list);
+
+   /*
+* Set up a snapshot if parse analysis/planning will need one.
+*/
+   if (analyze_requires_snapshot(parsetree))
+   {
+   PushActiveSnapshot(GetTransactionSnapshot());
+   snapshot_set = true;
+   }
+
+   querytree_list = pg_analyze_and_rewrite(parsetree, sql,
+   NULL, 0, NULL);
+
+   /* Done with the snapshot used for parsing/planning */
+   if (snapshot_set)
+   PopActiveSnapshot();
+
+   foreach(lc, querytree_list)
+   {
+   query = (Query *) lfirst(lc);
+   initStringInfo();
+
+   if (query->utilityStmt)
+   appendStringInfo(, "%s;\n", sql);
+   else
+   {
+   get_query_def(query, , NIL, NULL,
+ PRETTYFLAG_INDENT,
+  WRAP_COLUMN_DEFAULT, 0);
+
+   appendStringInfo(, "%s;\n", buf.data);
+   }
+   }
+   pfree(buf.data);
+
+   PG_RETURN_TEXT_P(string_to_text(res.data));
+}
 
 /* --
  * pg_get_ruledef  - Do it all and return a text
@@ -10989,6 +11054,16 @@ get_from_clause_item(Node *jtnode, Query *query, 
deparse_context *context)
if (strcmp(refname, rte->ctename) != 0)
printalias = true;
}
+   else if (rte->rtekind == RTE_SUBQUERY)
+   {
+   /*
+* For a subquery RTE, always print alias.  A 
user-specified

Re: row filtering for logical replication

2022-01-31 Thread Greg Nancarrow

On Mon, Jan 31, 2022 at 12:57 PM houzj.f...@fujitsu.com
 wrote:
>
> Attach the V74 patch set which did the following changes:
>

In the v74-0001 patch, I noticed the following code in get_rel_sync_entry():

+ /*
+ * Tuple slots cleanups. (Will be rebuilt later if needed).
+ */
+ oldctx = MemoryContextSwitchTo(data->cachectx);
+
+ if (entry->old_slot)
+ ExecDropSingleTupleTableSlot(entry->old_slot);
+ if (entry->new_slot)
+ ExecDropSingleTupleTableSlot(entry->new_slot);
+
+ entry->old_slot = NULL;
+ entry->new_slot = NULL;
+
+ MemoryContextSwitchTo(oldctx);

I don't believe the calls to MemoryContextSwitchTo() are required
here, because within the context switch it's just freeing memory, not
allocating it.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Testing autovacuum wraparound (including failsafe)

2022-01-31 Thread Masahiko Sawada

On Fri, Jun 11, 2021 at 10:19 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote:
> > Cool. Thank you for working on that!
> > Could you please share a WIP patch for the $subj? I'd be happy to help with
> > it.
>
> I've attached the current WIP state, which hasn't evolved much since
> this message... I put the test in 
> src/backend/access/heap/t/001_emergency_vacuum.pl
> but I'm not sure that's the best place. But I didn't think
> src/test/recovery is great either.
>

Thank you for sharing the WIP patch.

Regarding point (1) you mentioned (StartupSUBTRANS() takes a long time
for zeroing out all pages), how about using single-user mode instead
of preparing the transaction? That is, after pg_resetwal we check the
ages of datfrozenxid by executing a query in single-user mode. That
way, we don’t need to worry about autovacuum concurrently running
while checking the ages of frozenxids. I’ve attached a PoC patch that
does the scenario like:

1. start cluster with autovacuum=off and create tables with a few data
and make garbage on them
2. stop cluster and do pg_resetwal
3. start cluster in single-user mode
4. check age(datfrozenxid)
5. stop cluster
6. start cluster and wait for autovacuums to increase template0,
template1, and postgres datfrozenxids

I put new tests in src/test/module/heap since we already have tests
for brin in src/test/module/brin.

I think that tap test facility to run queries in single-user mode will
also be helpful for testing a new vacuum option/command that is
intended to use in emergency cases and proposed here[1].

Regards,

[1]  
https://www.postgresql.org/message-id/flat/20220128012842.GZ23027%40telsasoft.com#b76c13554f90d1c8bb5532d6f3e5cbf8


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/
diff --git a/src/test/modules/heap/.gitignore b/src/test/modules/heap/.gitignore
new file mode 100644
index 00..716e17f5a2
--- /dev/null
+++ b/src/test/modules/heap/.gitignore
@@ -0,0 +1,2 @@
+# Generated subdirectories
+/tmp_check/
diff --git a/src/test/modules/heap/Makefile b/src/test/modules/heap/Makefile
new file mode 100644
index 00..d3c08a04b7
--- /dev/null
+++ b/src/test/modules/heap/Makefile
@@ -0,0 +1,14 @@
+# src/test/modules/heap/Makefile
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/heap
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/heap/t/001_emergency_vacuum.pl b/src/test/modules/heap/t/001_emergency_vacuum.pl
new file mode 100644
index 00..3229f99921
--- /dev/null
+++ b/src/test/modules/heap/t/001_emergency_vacuum.pl
@@ -0,0 +1,131 @@
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+# Test for wraparound emergency situation
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 8;
+use IPC::Run qw(pump finish timer);
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+
+$node_primary->init(allows_streaming => 1);
+$node_primary->append_conf('postgresql.conf', qq/
+autovacuum = off # run autovacuum only when to anti wraparound
+max_prepared_transactions=10
+autovacuum_naptime = 1s
+# So it's easier to verify the order of operations
+autovacuum_max_workers=1
+autovacuum_vacuum_cost_delay=0
+log_autovacuum_min_duration=0
+/);
+$node_primary->start;
+
+#
+# Create tables for a few different test scenarios
+#
+
+$node_primary->safe_psql('postgres', qq/
+CREATE TABLE large(id serial primary key, data text, filler text default repeat(random()::text, 10));
+INSERT INTO large(data) SELECT generate_series(1,3);
+
+CREATE TABLE large_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10));
+INSERT INTO large_trunc(data) SELECT generate_series(1,3);
+
+CREATE TABLE small(id serial primary key, data text, filler text default repeat(random()::text, 10));
+INSERT INTO small(data) SELECT generate_series(1,15000);
+
+CREATE TABLE small_trunc(id serial primary key, data text, filler text default repeat(random()::text, 10));
+INSERT INTO small_trunc(data) SELECT generate_series(1,15000);
+
+CREATE TABLE autovacuum_disabled(id serial primary key, data text) WITH (autovacuum_enabled=false);
+INSERT INTO autovacuum_disabled(data) SELECT generate_series(1,1000);
+/);
+
+# Delete a few rows to ensure that vacuum has work to do.
+$node_primary->safe_psql('postgres', qq/
+DELETE FROM large WHERE id % 2 = 0;
+DELETE FROM large_trunc WHERE id > 1;
+DELETE FROM small WHERE id % 2 = 0;
+DELETE FROM small_trunc WHERE id > 1000;
+DELETE FROM autovacuum_disabled WHERE id % 2 = 0;
+/);
+
+
+# Stop the server and temporarily disable log_statement while running in single-user mode
+$node_primary->stop;
+$node_primary->append_conf('postgresql.conf', qq/
+log_statement = 'none'

Re: Make mesage at end-of-recovery less scary.

Hi, Pavel.

At Mon, 31 Jan 2022 15:17:09 +0400, Pavel Borisov  
wrote in 
> I don't quite understand a meaning of a comment:
>  /* it is completely zeroed, call it a day  */

While rethinking about this comment, It came to my mind that
XLogReaderValidatePageHeader is doing whole-page check.  There is no
clear reason for not doing at least the same check here.
ValidXLogRecordHeader is changed to check all bytes in the rest of the
page, instead of just the record header.

While working on that, I noticed another end-of-WAL case, unexpected
pageaddr.  I think we can assume it safe when the pageaddr is smaller
than expected (or we have no choice than assuming
so). XLogReaderValidatePageHeader is changed that way. But I'm not
sure others regard it as a form of safe end-of-WAL.

> Please also run pgindent on your code.

Hmm. I'm not sure we need to do that at this stage. pgindent makes
changes on the whole file involving unrelated part from this patch.
Anyway I did that then removed irrelevant edits.

pgindent makes a seemingly not-great suggestion.

+   char   *pe =
+   (char *) record + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));

I'm not sure this is intended but I split the line into two lines to
define and assign.

> Otherwise the new patch seems ok.

Thanks!

This version 10 is changed in the following points.

- Rewrited the comment in ValidXLogRecordHeader.
- ValidXLogRecordHeader

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9eacdd050a8041b358df11ca3e18c1071b693d20 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v10] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlog.c |  91 +-
 src/backend/access/transam/xlogreader.c   |  77 +++
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 ++-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/011_crash_recovery.pl | 110 +-
 6 files changed, 269 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..378c13ccf7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4480,6 +4480,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 	for (;;)
 	{
 		char	   *errormsg;
+		XLogRecPtr	ErrRecPtr = InvalidXLogRecPtr;
 
 		record = XLogReadRecord(xlogreader, );
 		if (record == NULL)
@@ -4495,6 +4496,18 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			{
 abortedRecPtr = xlogreader->abortedRecPtr;
 missingContrecPtr = xlogreader->missingContrecPtr;
+ErrRecPtr = abortedRecPtr;
+			}
+			else
+			{
+/*
+ * NULL ReadRecPtr means we could not read a record at the
+ * beginning. In that case EndRecPtr is storing the LSN of the
+ * record we tried to read.
+ */
+ErrRecPtr =
+	xlogreader->ReadRecPtr ?
+	xlogreader->ReadRecPtr : xlogreader->EndRecPtr;
 			}
 
 			if (readFile >= 0)
@@ -4504,13 +4517,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * If we get here for other than end-of-wal, emit the error
+			 * message right now. Otherwise the message if any is shown as a
+			 * part of the end-of-WAL message below.
 			 */
-			if (errormsg)
-ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
+			if (!xlogreader->EndOfWAL && errormsg)
+ereport(emode_for_corrupt_record(emode, ErrRecPtr),
 		(errmsg_internal("%s", errormsg) /* already translated */ ));
 		}
 
@@ -4541,11 +4553,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			/* Great, got a record */
 			return record;
 		}
-		else
+
+		/* No valid record available from this source */
+		lastSourceFailed = true;
+
+		if (!fetching_ckpt)
 		{
-			/* No valid record available from this source */
-			lastSourceFailed = true;
-
 			/*
 			 * If archive recovery was requested, but we were still doing
 			 * crash recovery, switch to archive recovery and retry using the
@@ -4558,11 +4571,17 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			 * we'd have no idea how far we'd have to replay to reach
 			 * consistency.  So err on the safe side and give up.
 			 */
-			if (!InArchiveRecovery && ArchiveRecoveryRequested &&
-

Re: row filtering for logical replication

2022-01-31 Thread Amit Kapila

On Sat, Jan 29, 2022 at 6:01 AM Andres Freund  wrote:
>
>
> > + if (has_filter)
> > + {
> > + /* Create or reset the memory context for row filters */
> > + if (entry->cache_expr_cxt == NULL)
> > + entry->cache_expr_cxt = 
> > AllocSetContextCreate(CacheMemoryContext,
> > +   
> > "Row filter expressions",
> > +   
> > ALLOCSET_DEFAULT_SIZES);
> > + else
> > + MemoryContextReset(entry->cache_expr_cxt);
>
> I see this started before this patch, but I don't think it's a great idea that
> pgoutput does a bunch of stuff in CacheMemoryContext. That makes it
> unnecessarily hard to debug leaks.
>
> Seems like all this should live somwhere below ctx->context, allocated in
> pgoutput_startup()?
>

Agreed.

> Consider what happens in a long-lived replication connection, where
> occasionally there's a transient error causing streaming to stop. At that
> point you'll just loose all knowledge of entry->cache_expr_cxt, no?
>

I think we will lose knowledge because the WALSender exits on ERROR
but that would be true even when we allocate it in this new allocated
context. Am, I missing something?

-- 
With Regards,
Amit Kapila.

Re: A qsort template

2022-01-31 Thread John Naylor

I wrote:

> 0010 - Thresholds on my TODO list.

I did some basic tests on the insertion sort thresholds, and it looks
like we could safely and profitably increase the current value from 7
to 20 or so, in line with other more recent implementations. I've
attached an addendum on top of 0012 and the full test results on an
Intel Coffee Lake machine with gcc 11.1. I found that the object test
setup in 0012 had some kind of bug that was comparing the pointer of
the object array. Rather than fix that, I decided to use Datums, but
with the two extremes in comparator: simple branching with machine
instructions vs. a SQL-callable function. The papers I've read
indicate the results for Datum sizes would not be much different for
small structs. The largest existing sort element is SortTuple, but
that's only 24 bytes and has a bulky comparator as well.

The first thing to note is that I rejected outright any testing of a
"middle value" where the pivot is simply the middle of the array. Even
the Bently and McIlroy paper which is the reference for our
implementation says "The range that consists of the single integer 7
could be eliminated, but has been left adjustable because on some
machines larger ranges are a few percent better".

I tested thresholds up to 64, which is where I guessed results to get
worse (most implementations are smaller than that). Here are the best
thresholds at a quick glance:

- elementary comparator:

random: 16 or greater
decreasing, rotate: get noticeably better all the way up to 64
organ: little difference, but seems to get better all the way up to 64
0/1: seems to get worse above 20

- SQL-callable comparator:

random: between 12 and 20, but slight differences until 32
decreasing, rotate: get noticeably better all the way up to 64
organ: seems best at 12, but slight differences until 32
0/1: slight differences

Based on these tests and this machine, it seems 20 is a good default
value. I'll repeat this test on one older Intel and one non-Intel
platform with older compilers.

--
Running tally of patchset:

0001 - bsearch and unique is good to have, and we can keep the return
type pending further tests -- if none happen this cycle, suggest
committing this without the return type symbol.
0002/3 - I've yet to see a case where branchless comparators win, but
other than that, these are good. Notational improvement and not
performance sensitive.

0004/5 - Computing the arguments slows it down, but accessing the
underlying int16s gives an improvement. [1] Haven't done an in-situ
test on VACUUM. Could be worth it for pg15, since I imagine the
proposals for dead tuple storage won't be ready this cycle.
0006 - I expect this to be slower too. I also wonder if this could
also use the global function in 0004 once it's improved.

0007 - untested

0008 - Good performance in microbenchmarks, no in-situ testing.
Inlined reversal is not worth the binary space or notational overhead.

0009 - Based on 0004, I would guess that computing the arguments is
too slow. Not sure how to test in-situ to see if specializing helps.

0010 - Suggest leaving out the middle threshold and setting the
insertion sort threshold to ~20. Might also name them
ST_INSERTION_SORT_THRESHOLD and ST_NINTHER_THRESHOLD. (TODO: test on
other platforms)

0011 - Committed.

v3-0001 comparators for abbreviated keys - Clearly a win in this state
already, especially
for the "unsigned" case [2]. (gist untested) There are additional
possible improvements mentioned,
but they seem like a PG16 project(s).

[1] 
https://www.postgresql.org/message-id/CA%2BhUKG%2BS5SMoG8Z2PHj0bsK70CxVLgqQR1orQJq6Cjgibu26vA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAFBsxsEFGAJ9eBpQVb5a86BE93WER3497zn2OT5wbjm1HHcqgA%40mail.gmail.com
(TODO: refine test)

-- 
John Naylor
EDB: http://www.enterprisedb.com
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=0, time=0.084503
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=1, time=0.087980
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=2, time=0.084299
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=0, time=0.081893
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=1, time=0.081907
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=2, time=0.081943
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=0, time=0.080810
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=1, time=0.080793
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=2, time=0.080922
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=0, time=0.080399
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=1, time=0.080433
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=2, time=0.080413
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=0, time=0.080342
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=1, time=0.080446
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=2, time=0.080339
NOTICE:

Re: Extensible Rmgr for Table AMs

2022-01-31 Thread Jeff Davis

On Tue, 2022-01-18 at 17:53 +0800, Julien Rouhaud wrote:
> > Review on patch 0001 would be helpful. It makes decoding just
> > another
> > method of rmgr, which makes a lot of sense to me from a code
> > organization standpoint regardless of the other patches. Is there
> > any
> > reason not to do that?
> 
> I think that it's a clean and sensible approach, so +1 for me.

Thank you, committed 0001. Other patches not committed yet.

> There's a bit of 0003 present in 002:

I just squashed 0002 and 0003 together. Not large enough to keep
separate.

> A few random comments on 0003:
> 
>  #define RM_MAX_ID  (RM_NEXT_ID - 1)
> +#define RM_CUSTOM_MIN_ID   128
> +#define RM_CUSTOM_MAX_ID   UINT8_MAX
> 
> It would be a good idea to add a StaticAssertStmt here to make sure
> that
> there's no overlap in the ranges.

Done.

> +
> +/*
> + * RmgrId to use for extensions that require an RmgrId, but are
> still in
> + * development and have not reserved their own unique RmgrId yet.
> See:
> + * https://wiki.postgresql.org/wiki/ExtensibleRmgr
> + */
> +#define RM_EXPERIMENTAL_ID 128
> 
> I'm a bit dubious about the whole "register your ID in the Wiki"
> approach.  It
> might not be a problem since there probably won't be hundreds of
> users, and I
> don't have any better suggestion since it has to be consistent across
> nodes.

Agree, but I don't see a better approach, either.

I do some sanity checking, which should catch collisions when they
happen.

> +   elog(LOG, "registering customer rmgr \"%s\" with ID %d",
> +rmgr->rm_name, rmid);
> 
> Should it be a DEBUG message instead?  Also s/customer/custom/

It seems like a fairly important thing to have in the log. Only a
couple extensions will ever encounter this message, and only at server
start.

Typo fixed.

> +RmgrData
> +GetCustomRmgr(RmgrId rmid)
> +{
> +   if (rmid < RM_CUSTOM_MIN_ID || rmid > RM_CUSTOM_MAX_ID)
> +   elog(PANIC, "custom rmgr id %d out of range", rmid);
> 
> Should this be an assert?

If we make it an Assert, then it won't be caught in production builds.

> +#define GetBuiltinRmgr(rmid) RmgrTable[(rmid)]
> +#define GetRmgr(rmid) ((rmid < RM_CUSTOM_MIN_ID) ? \
> +  GetBuiltinRmgr(rmid) : GetCustomRmgr(rmid))
> 
> rmid should be protected in the macro.

Done.

> How you plan to mark it experimental?

I suppose it doesn't need to be marked explicitly -- there are other
APIs that change. For instance, the ProcessUtility_hook changed, and
that's used much more widely.

As long as we generally agree that some kind of custom rmgrs are the
way to go, if we change the API or implementation around from version
to version I can easily update my table access method.

Regards,
Jeff Davis

From 62799e4546aa0a15b2a09f6b14900d785d64f42f Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 6 Nov 2021 13:01:38 -0700
Subject: [PATCH] Extensible rmgr.

Allow extensions to specify a new custom rmgr, which allows
specialized WAL. This is meant to be used by a custom Table Access
Method, which would not otherwise be able to offer logical
decoding/replication. It may also be used by new Index Access Methods.

Prior to this commit, only Generic WAL was available, which offers
support for recovery and physical replication but not logical
replication.

Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com
---
 src/backend/access/transam/rmgr.c| 91 
 src/backend/access/transam/xlog.c| 22 +++---
 src/backend/access/transam/xlogreader.c  |  2 +-
 src/backend/replication/logical/decode.c |  4 +-
 src/backend/utils/misc/guc.c |  6 +-
 src/include/access/rmgr.h| 14 
 src/include/access/xlog_internal.h   |  7 ++
 7 files changed, 129 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index f8847d5aebf..e04492a9507 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,12 +24,19 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
+#include "miscadmin.h"
 #include "replication/decode.h"
 #include "replication/message.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
+#include "utils/memutils.h"
 #include "utils/relmapper.h"
 
+typedef struct CustomRmgrEntry {
+	RmgrId rmid;
+	RmgrData *rmgr;
+} CustomRmgrEntry;
+
 /* must be kept in sync with RmgrData definition in xlog_internal.h */
 #define PG_RMGR(symname,name,redo,desc,identify,startup,cleanup,mask,decode) \
 	{ name, redo, desc, identify, startup, cleanup, mask, decode },
@@ -37,3 +44,87 @@
 const RmgrData RmgrTable[RM_MAX_ID + 1] = {
 #include "access/rmgrlist.h"
 };
+
+static CustomRmgrEntry *CustomRmgrTable = NULL;
+static int NumCustomRmgrs = 0;
+
+/*
+ * Register a new custom rmgr.
+ *
+ * Refer to https://wiki.postgresql.org/wiki/ExtensibleRmgr to

Re: row filtering for logical replication

2022-01-31 Thread Peter Smith

On Tue, Feb 1, 2022 at 12:07 PM Peter Smith  wrote:
>
> On Sat, Jan 29, 2022 at 11:31 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > Are there any recent performance evaluations of the overhead of row 
> > filters? I
> > think it'd be good to get some numbers comparing:
> >
> > 1) $workload with master
> > 2) $workload with patch, but no row filters
> > 3) $workload with patch, row filter matching everything
> > 4) $workload with patch, row filter matching few rows
> >
> > For workload I think it'd be worth testing:
> > a) bulk COPY/INSERT into one table
> > b) Many transactions doing small modifications to one table
> > c) Many transactions targetting many different tables
> > d) Interspersed DDL + small changes to a table
> >
>
> I have gathered performance data for the workload case (a):
>
> HEAD 46743.75
> v74 no filters 46929.15
> v74 allow 100% 46926.09
> v74 allow 75% 40617.74
> v74 allow 50% 35744.17
> v74 allow 25% 29468.93
> v74 allow 0% 22540.58
>
> PSA.
>
> This was tested using patch v74 and synchronous pub/sub. There are 1M
> INSERTS for publications using differing amounts of row filtering (or
> none).
>
> Observations:
> - There seems insignificant row-filter overheads (e.g. viz no filter
> and 100% allowed versus HEAD).
> - The elapsed time decreases linearly as there is less data getting 
> replicated.
>

FYI - attached are the test steps I used in case anyone wants to try
to reproduce these results.

--
Kind Regards,
Peter Smith.
Fujitsu Australia
TEST STEPS - Workload case a

1. Run initdb pub and sub and start both postgres instances (use the nosync 
postgresql.conf)

2. Run psql for both instances and create tables
CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

3. create the PUBLISHER on pub instance (e.g. choose from below depending on 
filter)
CREATE PUBLICATION pub_1 FOR TABLE test;
-- 100% (no filter)
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0);-- 100% 
allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 25);   -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 50);   -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 75);   -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 100);  -- 0% allowed

4. create the SUBSCRIBER on sub instance
CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432 
dbname=postgres application_name=sync_sub' PUBLICATION pub_1;

5. On pub side modify the postgresql.conf on the publisher side and restart
\q quite psql
edit synchronous_standby_names = 'sync_sub' 
restart the pub instance

6. Run psql (pub side) and perform the test run.
\timing
INSERT INTO test SELECT i, i::text, row_to_json(row(i)) FROM 
generate_series(1,101)i;
select count(*) from test;
TRUNCATE test;
select count(*) from test;
repeat 6 for each test run.

~~

Repeat from step 1 for different filter case.

Re: Replace uses of deprecated Python module distutils.sysconfig

2022-01-31 Thread Noah Misch

On Mon, Jan 31, 2022 at 05:18:47PM -0500, Tom Lane wrote:
> [ I was hoping for more opinions, but I guess nobody cares but us ]
> 
> Andres Freund  writes:
> > On 2022-01-27 17:53:02 -0500, Tom Lane wrote:
> >> So now we need to discuss whether we want to back-patch this.
> >> Pros: avoid configure warning now (not worth much); avoid outright
> >> build failure on Python 3.12+ in future.
> >> Cons: breaks compatibility with Python 2.6 and 3.1.

> If nobody else has weighed in by tomorrow, I'll backpatch to v10.

Works for me.  I agree wanting Python 3.12 w/ PG10.latest is far more likely
than wanting Python 2.6 or 3.1.  If someone lodges a non-academic complaint,
we could have back branches fallback to the old way if they detect a Python
version needing the old way.  I doubt anyone will complain.

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Tue, Feb 01, 2022 at 12:23:10AM +0530, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart  
> wrote:
>> I think the pg_controldata change needs some extra spaces for alignment,
>> but otherwise these patches seem reasonable to me.
> 
> Thanks. My bad it was. Changed in v6.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: archive modules

On Sat, Jan 29, 2022 at 09:01:41PM -0800, Nathan Bossart wrote:
> On Sat, Jan 29, 2022 at 04:31:48PM -0800, Nathan Bossart wrote:
>> On Sat, Jan 29, 2022 at 12:50:18PM -0800, Nathan Bossart wrote:
>>> Here is a new revision.  I've moved basic_archive to contrib, hardened it
>>> as suggested, and added shutdown support for archive modules.
>> 
>> cfbot was unhappy with v14, so here's another attempt.  One other change I
>> am pondering is surrounding pgarch_MainLoop() with PG_TRY/PG_FINALLY so
>> that we can also call the shutdown callback in the event of an ERROR.  This
>> might be necessary for an archive module that uses background workers.
> 
> Ugh.  Apologies for the noise.  cfbot still isn't happy, so here's yet
> another attempt.  This new patch set also ensures the shutdown callback is
> called when the archiver process exits.

If basic_archive is to be in contrib, we probably want to avoid restarting
the archiver every time the module ERRORs.  I debated trying to add a
generic exception handler that all archive modules could use, but I suspect
many will have unique cleanup requirements.  Plus, AFAICT restarting the
archiver isn't terrible, it just causes most of the normal retry logic to
be skipped.

I also looked into rewriting basic_archive to avoid ERRORs and return false
for all failures, but this was rather tedious.  Instead, I just introduced
a custom exception handler for basic_archive's archive callback.  This
allows us to ERROR as necessary (which helps ensure that failures show up
in the server logs), and the archiver can treat it like a normal failure
and avoid restarting.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From d5f91d973e2fab0951e76c6841e8fd827849a0ae Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 19 Nov 2021 01:04:41 +
Subject: [PATCH v17 1/3] Introduce archive modules infrastructure.

---
 src/backend/access/transam/xlog.c |   2 +-
 src/backend/postmaster/pgarch.c   | 111 --
 src/backend/postmaster/shell_archive.c|  24 +++-
 src/backend/utils/init/miscinit.c |   1 +
 src/backend/utils/misc/guc.c  |  12 +-
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/xlog.h |   1 -
 src/include/postmaster/pgarch.h   |  52 +++-
 8 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..958220c495 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8831,7 +8831,7 @@ ShutdownXLOG(int code, Datum arg)
 		 * process one more time at the end of shutdown). The checkpoint
 		 * record will go to the next XLOG file and won't be archived (yet).
 		 */
-		if (XLogArchivingActive() && XLogArchiveCommandSet())
+		if (XLogArchivingActive())
 			RequestXLogSwitch(false);
 
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 6e3fcedc97..865f1930df 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -89,6 +89,8 @@ typedef struct PgArchData
 	slock_t		arch_lck;
 } PgArchData;
 
+char *XLogArchiveLibrary = "";
+
 
 /* --
  * Local data
@@ -96,6 +98,8 @@ typedef struct PgArchData
  */
 static time_t last_sigterm_time = 0;
 static PgArchData *PgArch = NULL;
+static ArchiveModuleCallbacks ArchiveContext;
+
 
 /*
  * Stuff for tracking multiple files to archive from each scan of
@@ -140,6 +144,8 @@ static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
 static int ready_file_comparator(Datum a, Datum b, void *arg);
+static void LoadArchiveLibrary(void);
+static void call_archive_module_shutdown_callback(int code, Datum arg);
 
 /* Report shared memory space needed by PgArchShmemInit */
 Size
@@ -244,7 +250,16 @@ PgArchiverMain(void)
 	arch_files->arch_heap = binaryheap_allocate(NUM_FILES_PER_DIRECTORY_SCAN,
 ready_file_comparator, NULL);
 
-	pgarch_MainLoop();
+	/* Load the archive_library. */
+	LoadArchiveLibrary();
+
+	PG_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+	{
+		pgarch_MainLoop();
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(call_archive_module_shutdown_callback, 0);
+
+	call_archive_module_shutdown_callback(0, 0);
 
 	proc_exit(0);
 }
@@ -407,11 +422,12 @@ pgarch_ArchiverCopyLoop(void)
 			 */
 			HandlePgArchInterrupts();
 
-			/* can't do anything if no command ... */
-			if (!XLogArchiveCommandSet())
+			/* can't do anything if not configured ... */
+			if (ArchiveContext.check_configured_cb != NULL &&
+!ArchiveContext.check_configured_cb())
 			{
 ereport(WARNING,
-		(errmsg("archive_mode enabled, yet archive_command is not set")));
+		(errmsg("archive_mode enabled, yet archiving is not configured")));
 return;
 			}
 
@@ -492,7

Re: row filtering for logical replication

2022-01-31 Thread Andres Freund

On 2022-01-31 14:12:38 +1100, Greg Nancarrow wrote:
> This array was only ever meant to be read-only, and visible only to
> that function.
> IMO removing "static" makes things worse because now that array gets
> initialized each call to the function, which is unnecessary.
> I think it should just be: "static const int map_changetype_pubaction[] = ..."

Yes, static const is good. static alone, not so much.

Research and Interview on scale-out solutions in Japan

2022-01-31 Thread Yugo NAGATA

Hello, 

I made a wiki page to describe a research on scale-out solutions
and an interview on scale-out solutions to a few IT companies in
Japan. 
https://wiki.postgresql.org/wiki/PGECons_CR_Scaleout_2021

This research and interview were conducted by the community
relation team of PostgreSQL Enterprise Consortium [1]. We want
to share the results and hope that our feedback would be
beneficial to both PostgreSQL users and developers.

[1] PGECons is a non profit organization comprised of companies
in Japan to promote PostgreSQL (https://www.pgecons.org). 
SRA OSS, Inc. is one of the members.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: GUC flags

2022-01-31 Thread Justin Pryzby

On Mon, Jan 31, 2022 at 02:17:41PM +0900, Michael Paquier wrote:
> With all those doc fixes, applied after an extra round of review.  So
> this makes us rather covered with the checks on the flags.

Thanks

> Now, what do we do with the rest of check_guc that involve a direct
> lookup at what's on disk.  We have the following:
> 1) Check the format of the option lists in guc.c.
> 2) Check the format of postgresql.conf.sample:
> -- Valid options preceded by a '#' character.
> -- Valid options followed by ' =', with at least one space before the
> equal sign.
> 3) Check that options not marked as NOT_IN_SAMPLE are in the sample
> file.
> 
> I have never seen 1) as a problem, and pgindent takes care of that at
> some degree.  2) is also mostly cosmetic, and committers are usually
> careful when adding a new GUC.  3) would be the most interesting
> piece, and would cover most cases if we consider that a default
> installation just copies postgresql.conf.sample over, as proposed
> upthread in 0002.
> 
> Now, 3) has also the problem that it would fail installcheck as one
> can freely add a developer option in the configuration.  We could

I'm not clear on what things are required/prohibited to allow/expect
"installcheck" to pass.  It's possible that postgresql.conf doesn't even exist
in the data dir, right ?

It's okay with me if the config_file-reading stuff isn't re-implemented.

-- 
Justin

Re: Proposal: More structured logging

2022-01-31 Thread Greg Stark

1) I would like an interface which more or less guarantees that
*every* parameter of the log message is included in the structured
data. Ideally there should be no actual need to generate the formatted
messages for destinations like elastic search, just record the message
id and the parameters.

To that end I would thing something like:

ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) index, vacuumed_pages)));

Would end up like:

ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
  errtag("relname", vacrel->relname),
  errtag("dead_item_pointers", (long long) index),
  errtag("vacuumed_pages", vacuumed_pages;

And any untagged parameters could be included in the structured data
with unknown tag names. (Or perhaps some macro magic involving # but
I'm not sure that would be possible with va_args.)

2) I was kind of hoping we could have some sort of list of known tag
labels that could be used. So instead of an arbitrary string for the
label name you just referenced the known labels. Then we could
document that whenever "relid" appears as a tag id it's the oid of a
relation, for example.

This would help with things like "search elastic search for all log
entries for relid x, block p". Which wouldn't work if block is
sometimes "page" and sometimes "blocknum" and sometimes "block"

So I was imagining something like

errtag(TAG_RELID, relid)


3) When it comes to things like queryid it would be helpful to be able
to inject tags from higher in the call stack so that invididual
ereport sites don't need to include it and we can decide to add new
things like it in the future. This will be useful for things like
distributed tracing tools who need to inject a span id into the every
log message.

I imagine it could also be useful more generally. Application
developers might want to inject things like a pipeline identifier from
their CI tools and then pull any logs with that identifier into their
CI reports.

Yes, they could be added as new columns in CSV and top level json
attributes but we may well have new tags that are not always there or
are even created on user request. Or we may have cases where there can
be more than one active at a time.

4) As far as code ... this is probably premature microoptimization but
I do find the tags all being individually palloced and all going
through string formatting seems like a lot of overhead. Like the
actual number of errtag() calls in the ereport is always going to be
the same and the values are probably always going to be a single
parameter, not really using the formatting functionality.

I don't necessarily have a better alternative though. The only thing I
can think of isn't really any better:

errtag_str(RELNAME, relname)
errtag_int(RELID, relid)

(possibly with some magic where there's an errtag() function that
looks up which data type to expect for a given tag id).

It looks strange to me that the errorTag struct has a "const char
*tagname" but a "char *tagvalue". I think this is a side effect of the
code and not actually a sensible way to define the struct. Surely they
should both be const?

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-01-31 Thread Justin Pryzby

On Mon, Jan 31, 2022 at 04:40:09PM -0500, Greg Stark wrote:
> 4) This isn't really an issue with your patch at all but why on earth
> do we have a bitvector for WAL compression methods?! Like, what does
> it mean to have multiple compression methods set? That should just be
> a separate field with values for each type of compression surely?

I don't have an answer to your question, but the discussion was here.

In the versions of the patches I sent on Mar 15, Mar 21, May 18, May 24, Jun
13, I avoided "one bit per compression method", but Michael thought this was
simpler.

https://www.postgresql.org/message-id/20210622031358.gf29...@telsasoft.com
On Mon, Jun 21, 2021 at 10:13:58PM -0500, Justin Pryzby wrote:
> +/* compression methods supported */
> +#define BKPIMAGE_COMPRESS_PGLZ 0x04
> +#define BKPIMAGE_COMPRESS_ZLIB 0x08
> +#define BKPIMAGE_COMPRESS_LZ4  0x10
> +#define BKPIMAGE_COMPRESS_ZSTD 0x20
> +#defineBKPIMAGE_IS_COMPRESSED(info) \
> +   ((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_ZLIB | \
> + BKPIMAGE_COMPRESS_LZ4 | BKPIMAGE_COMPRESS_ZSTD)) != 
> 0)
> 
> You encouraged saving bits here, so I'm surprised to see that your patches
> use one bit per compression method: 2 bits to support no/pglz/lz4, 3 to add
> zstd, and the previous patch used 4 bits to also support zlib.
> 
> There are spare bits available for that, but now there can be an inconsistency
> if two bits are set.  Also, 2 bits could support 4 methods (including "no").

On Tue, Jun 22, 2021 at 12:53:46PM +0900, Michael Paquier wrote:
> Yeah, I know.  I have just finished with that to get something
> readable for the sake of the tests.  As you say, the point is moot
> just we add one new method, anyway, as we need just one new bit.
> And that's what I would like to do for v15 with LZ4 as the resulting
> patch is simple.  It would be an idea to discuss more compression
> methods here once we hear more from users when this is released in the
> field, re-considering at this point if more is necessary or not.

Re: Replace uses of deprecated Python module distutils.sysconfig

[ I was hoping for more opinions, but I guess nobody cares but us ]

Andres Freund  writes:
> On 2022-01-27 17:53:02 -0500, Tom Lane wrote:
>> So now we need to discuss whether we want to back-patch this.
>> Pros: avoid configure warning now (not worth much); avoid outright
>> build failure on Python 3.12+ in future.
>> Cons: breaks compatibility with Python 2.6 and 3.1.

> How about adding a note about the change to this set of minor releases, and
> backpatch in the next set?

Meh.  Nobody looks at minor release notes to find out what will happen
in some other minor release.  Moreover, the sort of people who might
be adversely affected are probably not absorbing every minor release
right away, so they'd very likely not see the advance warning anyway.

> I don't see much point in worrying somebody still building plpython with 2.6,
> given its age. I feel a tad more compassion with a future self that wants to
> build a by-then EOL version of postgres, and plpython fails to build. We
> didn't commit to keeping plpython building, but it's in my default build
> script, so ...

Hmm, well, we're certainly not making this change in pre-v10 releases,
so I'm not sure that changing v10 will make things much easier for your
future self.  But it's unusual for us to make back-patching decisions
on the sort of basis I proposed here, so I'm okay with just going back
to v10 instead.

> I vote for backpatching all the way either now, or after the next set of minor
> releases is tagged.

If nobody else has weighed in by tomorrow, I'll backpatch to v10.

regards, tom lane

Re: Support for NSS as a libpq TLS backend

2022-01-31 Thread Daniel Gustafsson

> On 31 Jan 2022, at 22:32, Andres Freund  wrote:
> 
> Hi,
> 
> On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote:
>>> On 28 Jan 2022, at 15:30, Robert Haas  wrote:
>>> I would really, really like to have an alternative to OpenSSL for PG.
>>> I don't know if this is the right thing, though. If other people are
>>> dropping support for it, that's a pretty bad sign IMHO. Later in the
>>> thread it says OpenLDAP have dropped support for it already as well.
>> 
>> I'm counting this and Andres' comment as a -1 on the patchset, and given 
>> where
>> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
>> objects.
> 
> I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code
> documentation and the state of code documentation. I'd like an openssl
> alternative, although not as much as a few years ago - it seems that the state
> of openssl has improved compared to most of the other implementations.

IMHO I think OpenSSL has improved over OpenSSL of the past - which is great to
see - but they have also diverged themselves into writing a full QUIC
implementation which *I personally think* is a distraction they don't need.

That being said, there aren't too many other options.

--
Daniel Gustafsson   https://vmware.com/

Re: Support for NSS as a libpq TLS backend

2022-01-31 Thread Daniel Gustafsson

> On 31 Jan 2022, at 17:24, Stephen Frost  wrote:
> * Daniel Gustafsson (dan...@yesql.se) wrote:

>> I'm counting this and Andres' comment as a -1 on the patchset, and given 
>> where
>> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
>> objects.
> 
> I agree that it's concerning to hear that OpenLDAP dropped support for
> NSS... though I don't seem to be able to find any information as to why
> they decided to do so.

I was also unable to do that.  There is no information that I could see in
either the commit message, Bugzilla entry (#9207) or on the mailinglist.
Searching the web didn't yield anything either.  I've reached out to hopefully
get a bit more information.

> I'm also very much a fan of having an alternative to OpenSSL and the
> NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL
> used by other projects, such as GnuTLS (which is the alternative to
> OpenSSL that OpenLDAP now has) or other libraries like wolfSSL.

Short of platform specific (proprietary) libraries like Schannel and Secure
Transport, the alternatives are indeed slim.

> Beyond the documentation issue, which I agree is a concern but also
> seems to be actively realized as an issue by the NSS/NSPR folks,

It is, but it has also been an issue for years to be honest, getting the docs
up to scratch will require a very large effort.

> is there some other reason that the curl folks are thinking of dropping 
> support
> for it?

It's also not really used anymore in conjunction with curl, with Red Hat no
longer shipping builds against it.

--
Daniel Gustafsson   https://vmware.com/

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-01-31 Thread Greg Stark

So I looked at this patch and I have the same basic question as Bruce.
Do we really want to expose every binary tool associated with Postgres
as an extension? Obviously this is tempting for cloud provider users
which is not an unreasonable argument. But it does have consequences.

1) Some things like pg_waldump are running code that is not normally
under user control. This could have security issues or reliability
issues.

On that front I'm especially concerned that pg_verify_raw_wal_record()
for example would let an attacker feed arbitrary hand crafted xlog
records into the parser which is not normally something a user can do.
If they feed it something it's not expecting it might be easy to cause
a crash and server restart.

There's also a bit of concern about data retention. Generally in
Postgres when rows are deleted there's very weak guarantees about the
data really being wiped. We definitely don't wipe it from disk of
course. And things like pageinspect could expose it long after it's
been deleted. But one might imagine after pageinspect shows it's gone
and/or after a vacuum full the data is actually purged. But then
something like pg_walinspect would make even that insufficient.

2) There's no documentation. I'm guessing you hesitated to write
documentation until the interface is settled but actually sometimes
writing documentation helps expose things in the interface that look
strange when you try to explain them.

3) And the interface does look a bit strange. Like what's the deal
with pg_get_wal_record_info_2() ? I gather it's just a SRF version of
pg_get_wal_record_info() but that's a strange name. And then what's
the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be
sufficient even for the specific case of a single record?

And the stats functions seem a bit out of place to me. If the SRF
returned the data in the right format the user should be able to do
aggregate queries to generate these stats easily enough. If anything a
simple SQL function to do the aggregations could be provided.

Now this is starting to get into the realm of bikeshedding but... Some
of the code is taken straight from pg_waldump and does things like:

+ appendStringInfo(_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",

But that's kind of out of place for an SQL interface. It makes it hard
to write queries since things like the relid, block number etc are in
the string. If I wanted to use these functions I would expect to be
doing something like "select all the decoded records pertaining to
block n".

All that said, I don't want to gatekeep based on this kind of
criticism. The existing code is based on pg_waldump and if we want an
extension to expose that then that's a reasonable place to start. We
can work on a better format for the data later it doesn't mean we
shouldn't start with something we have today.

4) This isn't really an issue with your patch at all but why on earth
do we have a bitvector for WAL compression methods?! Like, what does
it mean to have multiple compression methods set? That should just be
a separate field with values for each type of compression surely?

I suppose this raises the issue of what happens if someone fixes that.
They'll now have to update pg_waldump *and* pg_walinspect. I don't
think that would actually be a lot of work but it's definitely more
than just one. Also, perhaps they should be in the same contrib
directory so at least people won't forget there are two.

Re: Support tab completion for upper character inputs in psql

I wrote:
> =?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
>> First, as noted in the test, it doesn't preserve the case of the input
>> for keywords appended to the query result.  This is easily fixed by
>> using `pg_strdup_keyword_case()`, per the first attached patch.

> I thought about that, and intentionally didn't do it, because it
> would also affect the menus produced by tab completion.
> ...
> We could do something hacky like matching case only when there's
> no longer any matching object names, but that might be too magic.

I experimented with that, and it actually doesn't seem as weird
as I feared.  See if you like this ...

regards, tom lane

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index c4f6552ac9..7a265e0676 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -40,7 +40,7 @@ $node->start;
 
 # set up a few database objects
 $node->safe_psql('postgres',
-	"CREATE TABLE tab1 (f1 int primary key, f2 text);\n"
+	"CREATE TABLE tab1 (c1 int primary key, c2 text);\n"
 	  . "CREATE TABLE mytab123 (f1 int, f2 text);\n"
 	  . "CREATE TABLE mytab246 (f1 int, f2 text);\n"
 	  . "CREATE TABLE \"mixedName\" (f1 int, f2 text);\n"
@@ -317,14 +317,30 @@ check_completion(
 
 clear_line();
 
-# check completion of a keyword offered in addition to object names
-# (that code path currently doesn't preserve case of what's typed)
-check_completion(
-	"comment on constraint foo on dom\t",
-	qr|DOMAIN|,
-	"offer keyword in addition to query result");
-
-clear_query();
+# check completion of a keyword offered in addition to object names;
+# such a keyword should obey COMP_KEYWORD_CASE once only keyword
+# completions are possible
+foreach (
+	[ 'lower',  'CO', 'column' ],
+	[ 'upper',  'co', 'COLUMN' ],
+	[ 'preserve-lower', 'co', 'column' ],
+	[ 'preserve-upper', 'CO', 'COLUMN' ],)
+{
+	my ($case, $in, $out) = @$_;
+
+	check_completion(
+		"\\set COMP_KEYWORD_CASE $case\n",
+		qr/postgres=#/,
+		"set completion case to '$case'");
+	check_completion("alter table tab1 rename c\t\t",
+		qr|COLUMN|,
+		"offer keyword COLUMN for input c, COMP_KEYWORD_CASE = $case");
+	clear_query();
+	check_completion("alter table tab1 rename $in\t\t\t",
+		qr|$out|,
+		"offer keyword $out for input $in, COMP_KEYWORD_CASE = $case");
+	clear_query();
+}
 
 # send psql an explicit \q to shut it down, else pty won't close properly
 $timer->start(5);
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index b2ec50b4f2..bdc9760fba 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -4742,7 +4742,8 @@ _complete_from_query(const char *simple_query,
 {
 	static int	list_index,
 num_schema_only,
-num_other;
+num_other,
+num_keywords;
 	static PGresult *result = NULL;
 	static bool non_empty_object;
 	static bool schemaquoted;
@@ -4766,6 +4767,7 @@ _complete_from_query(const char *simple_query,
 		list_index = 0;
 		num_schema_only = 0;
 		num_other = 0;
+		num_keywords = 0;
 		PQclear(result);
 		result = NULL;
 
@@ -4986,7 +4988,10 @@ _complete_from_query(const char *simple_query,
 
 			/* In verbatim mode, we return all the items as-is */
 			if (verbatim)
+			{
+num_other++;
 return pg_strdup(item);
+			}
 
 			/*
 			 * In normal mode, a name requiring quoting will be returned only
@@ -5031,8 +5036,12 @@ _complete_from_query(const char *simple_query,
 list_index++;
 if (pg_strncasecmp(text, item, strlen(text)) == 0)
 {
-	num_other++;
-	return pg_strdup(item);
+	num_keywords++;
+	/* Match keyword case if we are returning only keywords */
+	if (num_schema_only == 0 && num_other == 0)
+		return pg_strdup_keyword_case(item, text);
+	else
+		return pg_strdup(item);
 }
 			}
 		}
@@ -5049,8 +5058,12 @@ _complete_from_query(const char *simple_query,
 list_index++;
 if (pg_strncasecmp(text, item, strlen(text)) == 0)
 {
-	num_other++;
-	return pg_strdup(item);
+	num_keywords++;
+	/* Match keyword case if we are returning only keywords */
+	if (num_schema_only == 0 && num_other == 0)
+		return pg_strdup_keyword_case(item, text);
+	else
+		return pg_strdup(item);
 }
 			}
 		}
@@ -5062,7 +5075,7 @@ _complete_from_query(const char *simple_query,
 	 * completion subject text, which is not what we want.
 	 */
 #ifdef HAVE_RL_COMPLETION_APPEND_CHARACTER
-	if (num_schema_only > 0 && num_other == 0)
+	if (num_schema_only > 0 && num_other == 0 && num_keywords == 0)
 		rl_completion_append_character = '\0';
 #endif

Re: Support for NSS as a libpq TLS backend

2022-01-31 Thread Andres Freund

Hi,

On 2022-01-31 14:24:03 +0100, Daniel Gustafsson wrote:
> > On 28 Jan 2022, at 15:30, Robert Haas  wrote:
> > I would really, really like to have an alternative to OpenSSL for PG.
> > I don't know if this is the right thing, though. If other people are
> > dropping support for it, that's a pretty bad sign IMHO. Later in the
> > thread it says OpenLDAP have dropped support for it already as well.
> 
> I'm counting this and Andres' comment as a -1 on the patchset, and given where
> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
> objects.

I'd make mine more a -0.2 or so. I'm concerned about the lack of non-code
documentation and the state of code documentation. I'd like an openssl
alternative, although not as much as a few years ago - it seems that the state
of openssl has improved compared to most of the other implementations.

Greetings,

Andres Freund

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Mark Dilger

> On Jan 31, 2022, at 10:50 AM, Stephen Frost  wrote:
> 
> Supporting that through ADMIN is one option, another would be a
> 'DROPROLE' attribute, though we'd want a way to curtail that from being
> able to be used for just any role and that does lead down a path similar
> to ownership or just generally the concept that some roles have certain
> rights over certain other roles (whether you create them or not...).

I've been operating under the assumption that I have a lot more freedom to 
create new features than to change how existing features behave, for two 
reasons: backwards compatibility and sql-spec compliance.

Changing how having ADMIN on a role works seems problematic for both those 
reasons.  My family got me socks for Christmas, not what I actually wanted, a 
copy of the SQL-spec.  So I'm somewhat guessing here.  But I believe we'd have 
problems if we "fixed" the part where a role can revoke ADMIN from others on 
themselves.  Whatever we have, whether we call it "ownership", it can't be 
something a role can unilaterally revoke.

As for a 'DROPROLE' attribute, I don't think that gets us anywhere.  You don't 
seem to think so, either.  So that leaves us with "ownership", perhaps by 
another word?  I only chose that word because it's what we use elsewhere, but 
if we want to call it "managementship" and "manager" or whatever, that's fine.  
I'm not to the point of debating the terminology just yet.  I'm still trying to 
get the behavior nailed down.

> I do think there's a lot of value in being able to segregate certain
> rights- consider that you may want a role that's able to create other
> roles, perhaps grant them into some set of roles, can lock those roles
> (prevent them from logging in, maybe do a password reset, something like
> that), but which *isn't* able to drop those roles (and all their
> objects) as that's dangerous and mistakes can certainly happen, or be
> able to become that role because the creating role simply doesn't have
> any need to be able to do that (or desire to in many cases, as we
> discussed in the landlord-vs-tenant sub-thread).

I'm totally on the same page.  Your argument upthread about wanting any 
malfeasance on the part of a service provider showing up in the audit logs was 
compelling.  Even for those things the "owner"/"manager" has the rights to do, 
we might want to make them choose to do it explicitly and not merely do it by 
accident.

> Naturally, you'd want *some* role to be able to drop that role (and one
> that doesn't have full superuser access) but that might be a role that's
> not able to create new roles or take over accounts.

I think it's important to go beyond the idea of a role attribute here.  It's 
not that role "bob" can drop roles.  It's that "bob" can drop *specific* roles, 
and for that, there has to be some kind of dependency tracked between "bob" and 
those other roles.  I'm calling that "ownership".  I think that language isn't 
just arbitrary, but actually helpful (technically, not politically) because 
REASSIGN OWNED should treat this kind of relationship exactly the same as it 
treats ownership of schemas, tables, functions, etc.

> Separation of concerns and powers and all of that is what we want to be
> going for here, more generically, which is why I was opposed to the
> blanket "owners have all rights of all roles they own" implementation.

I'm hoping to bring back, in v9, the idea of ownership/managership.  The real 
sticking point here is that we (Robert, Andrew, I, and possibly others) want to 
be able to drop in a non-superuser-creator-role into existing systems that use 
superuser for role management.  We'd like it to be as transparent a switch as 
possible.

With a superuser creating a role, that superuser can come back and muck with 
the role afterward, and the role can't revoke the superuser's right to do so.  
It's not enough that a non-superuser-creator-role (henceforth, "manager") can 
grant itself ADMIN on the created role.  It also needs to be able to set 
passwords, transfer object ownerships to/from the role, grant the role into 
other roles or other roles into it, etc.  All of that has to be sandboxed such 
that the "manager" can't touch stuff outside the manager's sandbox, but within 
the sandbox, it shouldn't make any practical difference that the manager isn't 
actually a superuser.

I think what I had in v7 was almost right.  I'm hoping that we just need to 
adjust things like the idea that managers always have implicit membership in 
and ADMIN on roles they manage.  I think that needs to be optional, and the 
audit logs could show if the manager granted themselves such things, as it 
might violate policy and be a red flag in the audit log.

> That approach doesn't support the ability to have a relatively
> unprivileged role that's able to create other roles, which seems like a
> pretty important use-case for us to be considering.

I think we have that ability. It's just that the creator role isn't

Re: Deparsing rewritten query

2022-01-31 Thread Pavel Stehule

po 31. 1. 2022 v 19:09 odesílatel Julien Rouhaud 
napsal:

> Hi,
>
> On Mon, Jan 31, 2022 at 06:46:37PM +0100, Pavel Stehule wrote:
> >
> > I checked the last patch.  I think it is almost trivial. I miss just
> > comment, why this alias is necessary
> >
> > + if (!rte->alias)
> > + rte->alias = makeAlias(get_rel_name(rte->relid), NULL);
>
> Thanks for looking at it Pavel!
>
> The alias is necessary because otherwise queries involving views won't
> produce
> valid SQL, as aliases for subquery is mandatory.  This was part of the v1
> regression tests:
>
> +-- test pg_get_query_def()
> +SELECT pg_get_query_def('SELECT * FROM shoe') as def;
> +  def
> +
> +  SELECT shoename, +
> + sh_avail, +
> + slcolor,  +
> + slminlen, +
> + slminlen_cm,  +
> + slmaxlen, +
> + slmaxlen_cm,  +
> + slunit+
> +FROM ( SELECT sh.shoename, +
> + sh.sh_avail,  +
> + sh.slcolor,   +
> + sh.slminlen,  +
> + (sh.slminlen * un.un_fact) AS slminlen_cm,+
> + sh.slmaxlen,  +
> + (sh.slmaxlen * un.un_fact) AS slmaxlen_cm,+
> + sh.slunit +
> +FROM shoe_data sh, +
> + unit un   +
> +   WHERE (sh.slunit = un.un_name)) shoe;   +
>
> the mandatory "shoe" alias is added with that change.
>

> I looked for other similar problems and didn't find anything, but given the
> complexity of the SQL standard it's quite possible that I missed some other
> corner case.
>

I don't feel good about forcing an alias. relname doesn't ensure
uniqueness. You can have two views with the same name from different
schemas. Moreover this field is necessary only when a deparsed query is
printed, not always.

Isn't possible to compute the correct subquery alias in print time when it
is missing?

Regards

Pavel

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Michael Banck

Hi,

Am Montag, dem 31.01.2022 um 09:18 -0800 schrieb Mark Dilger:
> On Jan 31, 2022, at 12:43 AM, Michael Banck < 
> michael.ba...@credativ.de> wrote:

> Ok, sure. I think this topic is hugely important and as I read the
> patch anyway, I added some comments, but yeah, we need to figure out
> the fundamentals first.

Right.

Perhaps some background on this patch series will help.  
[...]

Thanks a lot!


If we don't go with backwards compatibility, then CREATEROLE would only
allow you to create a new role, but not to give that role LOGIN, nor
CREATEDB, etc.  You'd need to also have admin option on those things. 
To create a role that can give those things away, you'd need to run
something like:

CREATE ROLE michael
CREATEROLE WITH ADMIN OPTION    -- can further give away
"createrole"
CREATEDB WITH ADMIN OPTION    -- can further give away
"createdb"
LOGIN WITH ADMIN OPTION    -- can further give away "login"
NOREPLICATION WITHOUT ADMIN OPTION    -- this would be implied
anyway
NOBYPASSRLS WITHOUT ADMIN OPTION    -- this would be implied anyway


CONNECTION LIMIT WITH ADMIN OPTION    -- can specify connection
limits
PASSWORD WITH ADMIN OPTION    -- can specify passwords
VALID UNTIL WITH ADMIN OPTION    -- can specify expiration

Those last three don't work for me:

postgres=# CREATE ROLE admin3 VALID UNTIL WITH ADMIN OPTION;
ERROR:  syntax error at or near "WITH"

postgres=# CREATE ROLE admin3 PASSWORD WITH ADMIN OPTION;
ERROR:  syntax error at or near "WITH"

postgres=# CREATE ROLE admin3 CONNECTION LIMIT WITH ADMIN OPTION;
ERROR:  syntax error at or near "WITH"

> (I'm on the fence about the phrase "WITH ADMIN OPTION" vs. the phrase
> "WITH GRANT OPTION".)
> 
> Even then, when "michael" creates new roles, if he wants to be able
> to further administer those roles, he needs to remember to give
> himself ADMIN membership in that role at creation time.  After the
> role is created, if he doesn't have ADMIN, he can't give it to
> himself.  So, at create time, he needs to remember to do this:
> 
> SET ROLE michael;
> CREATE ROLE mark ADMIN michael;

What would happen if ADMIN was implicit if michael is a non-superuser
and there's no ADMIN in the CREATE ROLE statement? It would be
backwards-compatible, one could still let somebody else be ADMIN, but 
ISTM a CREATEROLE role could no longer admin a role already existing
previously/it didn't create/got assigned admin for (e.g. the predefined
roles).

I.e. (responding what you wrote much further below), the CREATEROLE
role would no longer be ADMIN for all roles, just automatically for the
ones it created.

> But that's still a bit strange, because "ADMIN michael" means that
> michael can grant other roles membership in "mark", not that michael
> can, for example, change mark's password.

Yeah, changing a password is one of the important tasks of a delegated
role admin, if no superusers are around.

> If we don't want CREATEROLE to imply that you can mess around with
> arbitrary roles (rather than only roles that you created or have been
> transferred control over) then we need the concept of role
> ownership.  This patch doesn't go that far, so for now, only
> superusers can do those things.  Assuming some form of this patch is
> acceptable, the v9 series will resurrect some of the pre-v7 logic for
> role ownership and say that the owner can do those things.

> > Ok, so what I would have needed to do in the above in order to have
> > "admin2" and "admin" be the same as far as creating login users is (I
> > believe):
> > 
> > ALTER ROLE admin2 CREATEROLE LOGIN WITH ADMIN OPTION;
> 
> Yes, those it's more likely admin2 would have been created with these
> privileges to begin with, if the creator intended admin2 to do such
> things.

Right, maybe people just have to adjust to the new way. It still feels
strange that whatever you do at role creation time is more meaningful
than when altering a role. 

> 
> > By the way, is there now even a way to add admpassword to a role
> > after it got created?
> > 
> > postgres=# SET ROLE admin2;
> > SET
> > postgres=> \password test
> > Enter new password for user "test": 
> > Enter it again: 
> > ERROR:  must have admin on password to change password attribute
> > postgres=> RESET ROLE;
> > RESET
> > postgres=# ALTER ROLE admin2 PASSWORD WITH ADMIN OPTION;
> > ERROR:  syntax error at or near "WITH"
> > UPDATE pg_authid SET admpassword = 't' WHERE rolname = 'admin2';
> > UPDATE 1
> > postgres=# SET ROLE admin2;
> > SET
> > postgres=> \password test
> > Enter new password for user "test": 
> > Enter it again: 
> > postgres=> 
> 
> I don't really have this worked out yet.  That's mostly because I'm
> planning to fix it with role ownership, but perhaps there is a better
> way?

Well see above, maybe the patch is just broken/unfinished with respect
to PASSWORD and the others? It works for REPLICATION e.g.:

postgres=# ALTER ROLE admin2 REPLICATION WITH ADMIN OPTION;

Re: PITR: enhance getRecordTimestamp()

2022-01-31 Thread Simon Riggs

On Thu, 27 Jan 2022 at 06:58, Michael Paquier  wrote:
>
> On Wed, Nov 03, 2021 at 04:59:04PM +, Simon Riggs wrote:
> > Thanks. Fixed and rebased.
>
> +   if (xact_info == XLOG_XACT_PREPARE)
> +   {
> +   if (recoveryTargetUseOriginTime)
> +   {
> +   xl_xact_prepare *xlrec = (xl_xact_prepare *) 
> XLogRecGetData(record);
> +   xl_xact_parsed_prepare parsed;
> +
> +   ParsePrepareRecord(XLogRecGetInfo(record),
> + xlrec,
> + );
> +   *recordXtime = parsed.origin_timestamp;
> +   }
> +   else
> +   *recordXtime = ((xl_xact_prepare *) 
> XLogRecGetData(record))->prepared_at;
>
> As I learnt recently with ece8c76, there are cases where an origin
> timestamp may not be set in the WAL record that includes the origin
> timestamp depending on the setup done on the origin cluster.  Isn't
> this code going to finish by returning true when enabling
> recovery_target_use_origin_time in some cases, even if recordXtime is
> 0?  So it seems to me that this is lacking some sanity checks if
> recordXtime is 0.
>
> Could you add some tests for this proposal?  This adds various PITR
> scenarios that would be uncovered, and TAP should be able to cover
> that.

Thanks. Yes, will look at that.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Multiple Query IDs for a rewritten parse tree

2022-01-31 Thread Andrey Lepikhov


On 29/1/2022 12:51, Julien Rouhaud wrote:

4. We should reserve position of default in-core generator


 From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.


I don't really like this approach.  IIUC, this patch as-is is meant to break
pg_stat_statements extensibility.  If kind == 0 is reserved for in-core queryid
then you can't use pg_stat_statement with a different queryid generator
anymore.


Yes, it is one more problem. Maybe if we want to make it extensible, we 
could think about hooks in the pg_stat_statements too?



The patch also reserves kind == -1 for pg_stat_statements internal purpose, and
I'm not really sure why that's needed.
My idea - tags with positive numbers are reserved for generation 
results, that is performance-critical.
As I see during the implementation, pg_stat_statements makes additional 
changes on queryId (no matter which ones). Because our purpose is to 
avoid interference in this place, I invented negative values, where 
extensions can store their queryIds, based on any generator or not. 
Maybe it is redundant - main idea here was to highlight the issue.


I'm also unsure of how are extensions supposed to cooperate in general, as
I feel that queryids should be implemented for some "intent" (like monitoring,
planning optimization...).  That being said I'm not sure if e.g. AQO heuristics
are too specific for its need or if it could be shared with other extension
that might be dealing with similar concerns.

I think, it depends on a specific purpose of an extension.

--
regards,
Andrey Lepikhov
Postgres Professional

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Joshua Brindle

On Mon, Jan 31, 2022 at 1:50 PM Stephen Frost  wrote:
>
> Greetings,
>
> * Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > > On Jan 31, 2022, at 8:53 AM, Stephen Frost  wrote:
> > > Yeah, we do need to have a way to determine who is allowed to drop
> > > roles and role ownership seems like it's one possible approach to that.
> >
> > Which other ways are on the table?  Having ADMIN on a role doesn't allow 
> > you to do that, but maybe it could?  What else?
>
> Supporting that through ADMIN is one option, another would be a
> 'DROPROLE' attribute, though we'd want a way to curtail that from being
> able to be used for just any role and that does lead down a path similar
> to ownership or just generally the concept that some roles have certain
> rights over certain other roles (whether you create them or not...).
>
> I do think there's a lot of value in being able to segregate certain
> rights- consider that you may want a role that's able to create other
> roles, perhaps grant them into some set of roles, can lock those roles
> (prevent them from logging in, maybe do a password reset, something like
> that), but which *isn't* able to drop those roles (and all their
> objects) as that's dangerous and mistakes can certainly happen, or be
> able to become that role because the creating role simply doesn't have
> any need to be able to do that (or desire to in many cases, as we
> discussed in the landlord-vs-tenant sub-thread).
>

This is precisely the use case I am trying to accomplish with this
patchset, roughly:

- An automated bot that creates users and adds them to the employees role
- Bot cannot access any employee (or other roles) table data
- Bot cannot become any employee
- Bot can disable the login of any employee

Yes there are attack surfaces around the fringes of login, etc but
those can be mitigated with certificate authentication. My pg_hba
would require any role in the employees role to use cert auth.

This would adequately mitigate many threats while greatly enhancing
user management.

> Naturally, you'd want *some* role to be able to drop that role (and one
> that doesn't have full superuser access) but that might be a role that's
> not able to create new roles or take over accounts.

I suspect some kind of web backend to handle manual user pruning. I
don't expect Bot to automatically drop users because mistakes can
happen, and disabling the login ability seems like an adequate
tradeoff.

> Separation of concerns and powers and all of that is what we want to be
> going for here, more generically, which is why I was opposed to the
> blanket "owners have all rights of all roles they own" implementation.
> That approach doesn't support the ability to have a relatively
> unprivileged role that's able to create other roles, which seems like a
> pretty important use-case for us to be considering.

Agreed.

> The terminology seems to also be driving us in a certain direction and I
> don't know that it's necessarily a good one.  That is- the term 'owner'
> implies certain things and maybe that's where some of the objection to
> my argument that owners shouldn't necessarily have all rights of the
> roles they 'own' comes from (ok- I'll also put out there for general
> consideration that since we're talking about roles, and login roles are
> generally associated with people, that maybe 'owner' isn't a great term
> to use for this anyway ...).  I feel like the 'owner' concept came from
> the way we have table owners and function owners and database owners
> today rather than from a starting point of what do we wish to
> specifically enable.
>
> Perhaps instead of starting from the 'owner' concept, we start from the
> question about the kinds of things we want roles to be able to do and
> perhaps that will help inform the terminology.
>
> - Create new roles
> - Drop an existing role
> - Drop objects which belong to a role
> - Lock existing roles
> - Change/reset the PW of existing roles
> - Give roles to other roles
> - Revoke access to some roles from other roles
> - Give select role attributes to a role
> - Revoke role attributes from a role
> - Traditional role-based access control (group memberships, SET ROLE)
>
> Certain of the above are already covered by the existing role membership
> system and with the admin option, though there's definitely an argument
> to be made as to if that is as capable as we'd like it to be (there's no
> way to, today at least, GRANT *just* the admin option, for example, and
> maybe that's something that it would actually be sensible to support).
>
> Perhaps there is a need to have a user who has all of the above
> capabilities and maybe that would be an 'owner' or 'manager', but as I
> tried to illustrate above, there's definitely use-cases for giving
> a role only some of the above capabilities rather than all of them
> together at once.
>
> > >> The main idea here is that having CREATEROLE doesn't give you ADMIN on 
> > >> roles, nor on role attributes.  For

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Tue, Feb 1, 2022 at 12:00 AM Nathan Bossart  wrote:
>
> On Mon, Jan 31, 2022 at 11:45:19PM +0530, Bharath Rupireddy wrote:
> > Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
> > lsn in the checkpoint completed message and 0002 for changing the
> > "location" to LSN in pg_controdata's output. With the 0002,
> > pg_control_checkpont, pg_controldata and checkpoint completed message
> > will all be in sync with the checkpoint lsn and redo lsn.
>
> I think the pg_controldata change needs some extra spaces for alignment,
> but otherwise these patches seem reasonable to me.

Thanks. My bad it was. Changed in v6.

Regards,
Bharath Rupireddy.


v6-0001-Add-checkpoint-and-redo-LSN-to-LogCheckpointEnd-l.patch
Description: Binary data


v6-0002-change-location-to-lsn-in-pg_controldata.patch
Description: Binary data

Re: CREATEROLE and role ownership hierarchies

Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Jan 31, 2022, at 8:53 AM, Stephen Frost  wrote:
> > Yeah, we do need to have a way to determine who is allowed to drop
> > roles and role ownership seems like it's one possible approach to that.
> 
> Which other ways are on the table?  Having ADMIN on a role doesn't allow you 
> to do that, but maybe it could?  What else?

Supporting that through ADMIN is one option, another would be a
'DROPROLE' attribute, though we'd want a way to curtail that from being
able to be used for just any role and that does lead down a path similar
to ownership or just generally the concept that some roles have certain
rights over certain other roles (whether you create them or not...).

I do think there's a lot of value in being able to segregate certain
rights- consider that you may want a role that's able to create other
roles, perhaps grant them into some set of roles, can lock those roles
(prevent them from logging in, maybe do a password reset, something like
that), but which *isn't* able to drop those roles (and all their
objects) as that's dangerous and mistakes can certainly happen, or be
able to become that role because the creating role simply doesn't have
any need to be able to do that (or desire to in many cases, as we
discussed in the landlord-vs-tenant sub-thread).

Naturally, you'd want *some* role to be able to drop that role (and one
that doesn't have full superuser access) but that might be a role that's
not able to create new roles or take over accounts.

Separation of concerns and powers and all of that is what we want to be
going for here, more generically, which is why I was opposed to the
blanket "owners have all rights of all roles they own" implementation.
That approach doesn't support the ability to have a relatively
unprivileged role that's able to create other roles, which seems like a
pretty important use-case for us to be considering.

The terminology seems to also be driving us in a certain direction and I
don't know that it's necessarily a good one.  That is- the term 'owner'
implies certain things and maybe that's where some of the objection to
my argument that owners shouldn't necessarily have all rights of the
roles they 'own' comes from (ok- I'll also put out there for general
consideration that since we're talking about roles, and login roles are
generally associated with people, that maybe 'owner' isn't a great term
to use for this anyway ...).  I feel like the 'owner' concept came from
the way we have table owners and function owners and database owners
today rather than from a starting point of what do we wish to
specifically enable.

Perhaps instead of starting from the 'owner' concept, we start from the
question about the kinds of things we want roles to be able to do and
perhaps that will help inform the terminology.

- Create new roles
- Drop an existing role
- Drop objects which belong to a role
- Lock existing roles
- Change/reset the PW of existing roles
- Give roles to other roles
- Revoke access to some roles from other roles
- Give select role attributes to a role
- Revoke role attributes from a role
- Traditional role-based access control (group memberships, SET ROLE)

Certain of the above are already covered by the existing role membership
system and with the admin option, though there's definitely an argument
to be made as to if that is as capable as we'd like it to be (there's no
way to, today at least, GRANT *just* the admin option, for example, and
maybe that's something that it would actually be sensible to support).

Perhaps there is a need to have a user who has all of the above
capabilities and maybe that would be an 'owner' or 'manager', but as I
tried to illustrate above, there's definitely use-cases for giving
a role only some of the above capabilities rather than all of them
together at once.

> >> The main idea here is that having CREATEROLE doesn't give you ADMIN on 
> >> roles, nor on role attributes.  For role attributes, the syntax has been 
> >> extended.  An excerpt from the patch's regression test illustrates some of 
> >> that concept:
> >> 
> >> -- ok, superuser can create a role that can create login replication 
> >> users, but
> >> -- cannot itself login, nor perform replication
> >> CREATE ROLE regress_role_repladmin
> >>CREATEROLE WITHOUT ADMIN OPTION -- can create roles, but cannot 
> >> give it away
> >>NOCREATEDB WITHOUT ADMIN OPTION -- cannot create db, nor give it 
> >> away
> >>NOLOGIN WITH ADMIN OPTION   -- cannot log in, but can give it 
> >> away
> >>NOREPLICATION WITH ADMIN OPTION -- cannot replicate, but can give 
> >> it away
> >>NOBYPASSRLS WITHOUT ADMIN OPTION;   -- cannot bypassrls, nor give it 
> >> away
> >> 
> >> -- ok, superuser can create a role with CREATEROLE but restrict give-aways
> >> CREATE ROLE regress_role_minoradmin
> >>NOSUPERUSER -- WITHOUT ADMIN OPTION is implied
> >>

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Mon, Jan 31, 2022 at 11:45:19PM +0530, Bharath Rupireddy wrote:
> Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
> lsn in the checkpoint completed message and 0002 for changing the
> "location" to LSN in pg_controdata's output. With the 0002,
> pg_control_checkpont, pg_controldata and checkpoint completed message
> will all be in sync with the checkpoint lsn and redo lsn.

I think the pg_controldata change needs some extra spaces for alignment,
but otherwise these patches seem reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

On Mon, Jan 31, 2022 at 8:41 PM Stephen Frost  wrote:
>
> > Thanks for your review. In summary, we have these options to choose
> > checkpoint LSN and last REDO LSN:
> >
> > 1)  "start=%X/%X, end=%X/%X" (ControlFile->checkPointCopy.redo,
> > ControlFile->checkPoint)
> > 2)  "lsn=%X/%X, redo lsn=%X/%X"
> > 3) "location=%X/%X, REDO location=%X/%X" -- similar to what
> > pg_controldata and pg_control_checkpoint shows currently.
> > 4) "location=%X/%X, REDO start location=%X/%X"
> >
> > I will leave that decision to the committer.
>
> I'd also vote for #2.  Regarding 3 and 4, I'd argue that those should
> have been changed when we changed a number of other things from the
> generic 'location' to be 'lsn' in system views and functions, and
> therefore we should go change those to also specify 'lsn' rather than
> just saying 'location'.

Thanks. Here are 2 patches, 0001 for adding checkpoint lsn and redo
lsn in the checkpoint completed message and 0002 for changing the
"location" to LSN in pg_controdata's output. With the 0002,
pg_control_checkpont, pg_controldata and checkpoint completed message
will all be in sync with the checkpoint lsn and redo lsn.

Regards,
Bharath Rupireddy.
From beafea457cd80266c6e517862ed819da7b3c8ec0 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 31 Jan 2022 16:51:47 +
Subject: [PATCH v5] Add checkpoint and redo LSN to LogCheckpointEnd log
 message

It is useful (for debugging purposes) if the checkpoint end message
has the checkpoint LSN(end) and REDO LSN(start). It gives more
context while analyzing checkpoint-related issues. The pg_controldata
gives the last checkpoint LSN and REDO LSN, but having this info
alongside the log message helps analyze issues that happened
previously, connect the dots and identify the root cause.
---
 src/backend/access/transam/xlog.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dfe2a0bcce..71a5239619 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8919,11 +8919,14 @@ LogCheckpointEnd(bool restartpoint)
 
 	if (restartpoint)
 		ereport(LOG,
-(errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
+(errmsg("restartpoint complete: lsn=%X/%X, redo lsn=%X/%X; "
+		"wrote %d buffers (%.1f%%); "
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB",
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+		LSN_FORMAT_ARGS(ControlFile->checkPoint),
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -8939,11 +8942,14 @@ LogCheckpointEnd(bool restartpoint)
 		(int) (CheckPointDistanceEstimate / 1024.0;
 	else
 		ereport(LOG,
-(errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
+(errmsg("checkpoint complete: lsn=%X/%X, redo lsn=%X/%X; "
+		"wrote %d buffers (%.1f%%); "
 		"%d WAL file(s) added, %d removed, %d recycled; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB",
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+		LSN_FORMAT_ARGS(ControlFile->checkPoint),
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
-- 
2.25.1

From fd466fb14f16db15292974b475338335668148bf Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 31 Jan 2022 17:50:40 +
Subject: [PATCH v5] change "location" to "lsn" in pg_controldata

---
 src/bin/pg_controldata/pg_controldata.c| 10 +-
 src/test/recovery/t/016_min_consistency.pl |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index f911f98d94..0c5ef7804c 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -235,9 +235,9 @@ main(int argc, char *argv[])
 		   dbState(ControlFile->state));
 	printf(_("pg_control last modified: %s\n"),
 		   pgctime_str);
-	printf(_("Latest checkpoint location:   %X/%X\n"),
+	printf(_("Latest checkpoint LSN:   %X/%X\n"),
 		   LSN_FORMAT_ARGS(ControlFile->checkPoint));
-	printf(_("Latest checkpoint's REDO location:%X/%X\n"),
+	printf(_("Latest checkpoint's REDO LSN:%X/%X\n"),
 		   LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo));
 	printf(_("Latest checkpoint's REDO WAL file:%s\n"),
 		   xlogfilename);
@@ -274,13 +274,13 @@ main(int argc, char *argv[])
 		   ckpttime_str);
 	printf(_("Fake LSN counter for unlogged rels:   %X/%X\n"),

Re: missing indexes in indexlist with partitioned tables

2022-01-31 Thread Arne Roland

Hi!

From: Amit Langote 
Sent: Tuesday, January 25, 2022 09:04
Subject: Re: missing indexes in indexlist with partitioned tables
>  [...]
> "partindexlist" really made me think about a list of "partial indexes"
> for some reason.  I think maybe "partedindexlist" is what you are
> looking for; "parted" is commonly used as short for "partitioned" when
> naming variables.
>
> The comment only mentions "further pruning" as to what partitioned
> indexes are to be remembered in RelOptInfo, but it's not clear what
> that means.  It may help to be more specific.

Thanks for the feedback! I've changed that. The current version is attached.

> Finally, I don't understand why we need a separate field to store
> indexes found in partitioned base relations.  AFAICS, nothing but the
> sites you are interested in (relation_has_unique_index_for() and
> rel_supports_distinctness()) would ever bother to look at a
> partitioned base relation's indexlist.  Do you think putting them into
> in indexlist might break something?

I have thought about that before. AFAICT there is nothing in core, which 
breaks. However I am not sure, I want to mix those two kinds of index nodes. 
First of all the structure is different, partedIndexes don't have physical 
attributes after all. This is technical implementation detail relating to the 
current promise, that entries of the indexlist are indexes we can use. And by 
use, I mean use for statistics or the executor.
I'm more concerned about future changes regarding the order and optimization of 
processing harder here. The order in which we do things in the planner is a bit 
messy, and I wouldn't mind seeing details about that change. Looking at the 
current wacky order in the optimizer, I'm not convinced, that nothing will want 
to have a look at the indexlist, before partitioned tables are unpacked.

Since it would be easy to introduce this new variable later, wouldn't mind 
adding it to the indexlist directly for now. But changing the underlying 
promise of what it contains, seems noteworthy and more intrusive to me.

> > Side note: I personally think the name inhparent is mildly confusing, since 
> > it's not really about inheritance. I don't have a significantly better idea 
> > though.
>
> Partitioned tables are "inheritance parent", so share the same code as
> what traditional inheritance parents have always used for planning.

I recall that manual partitioning via inheritance, that was cumbersome. Though 
that minor historical detail was not, what I was referring to. There are a lot 
of other cases, that cause us to set inhparent. IIRC we use this flag in some 
ddl commands, which have nothing to do with inheritance. It essentially is used 
as a variant to skip the indexlist creation. If such hacks weren't there, we 
could simply check for the relkind and indisunique.

Regards
Arne

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..5225076df3 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3504,7 +3504,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	Assert(list_length(exprlist) == list_length(oprlist));

 	/* Short-circuit if no indexes... */
-	if (rel->indexlist == NIL)
+	if (rel->indexlist == NIL && rel->partedIndexlist == NIL)
 		return false;

 	/*
@@ -3549,7 +3549,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 		return false;

 	/* Examine each index of the relation ... */
-	foreach(ic, rel->indexlist)
+	foreach(ic, list_concat(rel->indexlist, rel->partedIndexlist))
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..16ce443ec9 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -23,6 +23,7 @@
 #include "postgres.h"

 #include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/optimizer.h"
@@ -598,7 +599,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 		 */
 		ListCell   *lc;

-		foreach(lc, rel->indexlist)
+		foreach(lc, list_concat(rel->indexlist, rel->partedIndexlist))
 		{
 			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a5002ad895..2451b2ae79 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -116,8 +116,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 {
 	Index		varno = rel->relid;
 	Relation	relation;
-	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *partedIndexinfos = NIL;

 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -154,17 +154,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool

Re: Catalog version access

On Mon, Jan 31, 2022 at 04:57:13PM +0900, Michael Paquier wrote:
> On Tue, Jan 25, 2022 at 01:12:32PM +0900, Michael Paquier wrote:
>> Once you remove the requirement of a running server, we have basically
>> what has been recently implemented with postgres -C for
>> runtime-computed GUCs, because we already go through a read of the
>> control file to be able to print those GUCs with their correct
>> values.  This also means that it is already possible to check if a
>> data folder is compatible with a set of binaries with this facility,
>> as any postgres -C command with a runtime GUC would trigger this
>> check.  Using any of the existing runtime GUCs may be confusing, but
>> that would work.  And I am not really convinced that we have any need
>> to add a specific GUC for this purpose, be it a sort of
>> is_controlfile_valid or controlfile_checksum (CRC32 of the control
>> file).
> 
> Thinking more about this one, we can already do that, so I have
> marked the patch as RwF.  Perhaps we could just add a GUC, but that
> feels a bit dummy.

Sorry, I missed this thread earlier.  You're right, we can just do
something like the following to achieve basically the same result:

postgres -D . -C data_checksums

Unless Vik has any objections, this can probably be marked as Withdrawn.
Perhaps we can look into providing a new option for "postgres" at some
point in the future, but I don't sense a ton of demand at the moment.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Deparsing rewritten query

Hi,

On Mon, Jan 31, 2022 at 06:46:37PM +0100, Pavel Stehule wrote:
> 
> I checked the last patch.  I think it is almost trivial. I miss just
> comment, why this alias is necessary
> 
> + if (!rte->alias)
> + rte->alias = makeAlias(get_rel_name(rte->relid), NULL);

Thanks for looking at it Pavel!

The alias is necessary because otherwise queries involving views won't produce
valid SQL, as aliases for subquery is mandatory.  This was part of the v1
regression tests:

+-- test pg_get_query_def()
+SELECT pg_get_query_def('SELECT * FROM shoe') as def;
+  def
+
+  SELECT shoename, +
+ sh_avail, +
+ slcolor,  +
+ slminlen, +
+ slminlen_cm,  +
+ slmaxlen, +
+ slmaxlen_cm,  +
+ slunit+
+FROM ( SELECT sh.shoename, +
+ sh.sh_avail,  +
+ sh.slcolor,   +
+ sh.slminlen,  +
+ (sh.slminlen * un.un_fact) AS slminlen_cm,+
+ sh.slmaxlen,  +
+ (sh.slmaxlen * un.un_fact) AS slmaxlen_cm,+
+ sh.slunit +
+FROM shoe_data sh, +
+ unit un   +
+   WHERE (sh.slunit = un.un_name)) shoe;   +

the mandatory "shoe" alias is added with that change.

I looked for other similar problems and didn't find anything, but given the
complexity of the SQL standard it's quite possible that I missed some other
corner case.

Re: plperl on windows

2022-01-31 Thread Andres Freund

On 2022-01-31 10:43:31 -0500, Andrew Dunstan wrote:
> We should fix that, maybe along these lines?

WFM.

Re: Deparsing rewritten query

2022-01-31 Thread Pavel Stehule

Hi

po 31. 1. 2022 v 15:37 odesílatel Gilles Darold  napsal:

> Le 28/06/2021 à 18:41, Julien Rouhaud a écrit :
> > Thanks for the feedback Gilles!
> >
> > On Mon, Jun 28, 2021 at 04:06:54PM +0200, Gilles Darold wrote:
> >> If we could at least call get_query_def()through an extension if we
> didn't
> >> have a functionit would be ideal for DBAs.I agree this is unusual but
> when
> >> it does happen to you being able to call get_query_def () helps a lot.
> > Since at least 2 other persons seems to be interested in that feature, I
> can
> > take care of writing and maintaining such an extension, provided that the
> > required infrastructure is available in core.
> >
> > PFA v2 of the patch which only adds the required alias and expose
> > get_query_def().
>

I checked the last patch.  I think it is almost trivial. I miss just
comment, why this alias is necessary

+ if (!rte->alias)
+ rte->alias = makeAlias(get_rel_name(rte->relid), NULL);

Regards

Pavel



>
> Thanks a lot Julien, such facilities are really helpful for DBAs and
> make the difference with other RDBMS. I don't think that this feature
> exists else where.
>
>
> --
> Gilles Darold
> http://www.darold.net/
>
>
>
>
>
>
>

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Mark Dilger




> On Jan 31, 2022, at 8:53 AM, Stephen Frost  wrote:
> 
> Yeah, we do need to have a way to determine who is allowed to drop
> roles and role ownership seems like it's one possible approach to that.

Which other ways are on the table?  Having ADMIN on a role doesn't allow you to 
do that, but maybe it could?  What else?

>> The main idea here is that having CREATEROLE doesn't give you ADMIN on 
>> roles, nor on role attributes.  For role attributes, the syntax has been 
>> extended.  An excerpt from the patch's regression test illustrates some of 
>> that concept:
>> 
>> -- ok, superuser can create a role that can create login replication users, 
>> but
>> -- cannot itself login, nor perform replication
>> CREATE ROLE regress_role_repladmin
>>CREATEROLE WITHOUT ADMIN OPTION -- can create roles, but cannot give 
>> it away
>>NOCREATEDB WITHOUT ADMIN OPTION -- cannot create db, nor give it away
>>NOLOGIN WITH ADMIN OPTION   -- cannot log in, but can give it away
>>NOREPLICATION WITH ADMIN OPTION -- cannot replicate, but can give it 
>> away
>>NOBYPASSRLS WITHOUT ADMIN OPTION;   -- cannot bypassrls, nor give it away
>> 
>> -- ok, superuser can create a role with CREATEROLE but restrict give-aways
>> CREATE ROLE regress_role_minoradmin
>>NOSUPERUSER -- WITHOUT ADMIN OPTION is implied
>>CREATEROLE WITHOUT ADMIN OPTION
>>NOCREATEDB WITHOUT ADMIN OPTION
>>NOLOGIN WITHOUT ADMIN OPTION
>>NOREPLICATION   -- WITHOUT ADMIN OPTION is implied
>>NOBYPASSRLS -- WITHOUT ADMIN OPTION is implied
>>NOINHERIT WITHOUT ADMIN OPTION
>>CONNECTION LIMIT NONE WITHOUT ADMIN OPTION
>>VALID ALWAYS WITHOUT ADMIN OPTION
>>PASSWORD NULL WITHOUT ADMIN OPTION;
> 
> Right, this was one of the approaches that I was thinking could work for
> managing role attributes and it's very similar to roles and the admin
> option for them.  As I suggested at least once, another possible
> approach could be to have login users not be able to create roles but
> for them to be able to SET ROLE to a role which is able to create roles,
> and then, using your prior method, only allow the attributes which that
> role has to be able to be given to other roles.

I'm not sure how that works.  If I have a group named "administrators" which as 
multiple attributes like BYPASSRLS and such, and user "stephen" is a member of 
"administrators", then stephen can not only give away bypassrls to new users 
but also has it himself.  How is that an improvement?  (I mean this as a 
question, not as criticism.)

>  That essentially makes
> a role be a proxy for the per-attribute admin options.  There's pros and
> cons for each approach and so I'm curious as to which you feel is the
> better approach?  I get the feeling that you're more inclined to go with
> the approach of having an admin option for each role attribute (having
> written this WIP patch) but I'm not sure if that is because you
> contempltaed both and felt this was better for some reason or more
> because I wasn't explaining the other approach very well, or if there
> was some other reason.

I need more explanation of the other option you are contemplating.  My 
apologies if I'm being thick-headed.

>> -- fail, having CREATEROLE is not enough to create roles in privileged roles
>> SET SESSION AUTHORIZATION regress_role_minoradmin;
>> CREATE ROLE regress_nosuch_read_all_data IN ROLE pg_read_all_data;
>> ERROR:  must have admin option on role "pg_read_all_data"
> 
> I would say not just privileged roles, but any roles that the user
> doesn't have admin rights on.

Yes, that's how it works.  But this portion of the test is only checking the 
interaction between CREATEROLE and built-in privileged roles, hence the comment.

>> Whether "WITH ADMIN OPTION" or "WITHOUT ADMIN OPTION" is implied hinges on 
>> whether the role is given CREATEROLE.  That hackery is necessary to preserve 
>> backwards compatibility.  If we don't care about compatibility, I could 
>> change the patch to make "WITHOUT ADMIN OPTION" implied for all attributes 
>> when not specified.
> 
> Given the relative size of the changes we're talking about regarding
> CREATEROLE, I don't really think we need to stress about backwards
> compatibility too much.

Yeah, I'm leaning pretty strongly that way, too.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: drop tablespace failed when location contains .. on win32

Andrew Dunstan  writes:
> On 1/30/22 16:50, Tom Lane wrote:
>> Here's a revised patch version that does it like that.  I also
>> reviewed and simplified the canonicalize_path logic.  I think
>> this is committable.

> LGTM

Pushed, thanks for looking.

I think I'm also going to have a look at simplifying some of the
dependent code, just because it feels weird to leave that unfinished.
In particular, Shenhao-san suggested upthread that we could remove
path_contains_parent_reference().  I complained about that at the
time, but I hadn't quite absorbed the fact that an absolute path
is now *guaranteed* not to have any ".." after canonicalize_path.
So the existing calls in adminpack.c and genfile.c are certainly
dead code.  We probably want to keep path_contains_parent_reference()
in case some extension is using it, but seeing that its API spec
already requires the input to be canonicalized, it could be simplified
to just check for ".." at the start.

regards, tom lane

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Mark Dilger

> On Jan 31, 2022, at 12:43 AM, Michael Banck  wrote:

> Ok, sure. I think this topic is hugely important and as I read the
> patch anyway, I added some comments, but yeah, we need to figure out
> the fundamentals first.

Right.

Perhaps some background on this patch series will help.  The patch versions 
before v8 were creating an owner-owned relationship between the creator and the 
createe, and a lot of privileges were dependent on that ownership.  Stephen 
objected that we were creating parallel tracks on which the privilege system 
was running; things like belonging to a role or having admin on a role were 
partially conflated with owning a role.  He also objected that the pre-v8 patch 
sets allowed a creator role with the CREATEROLE privilege to give away any 
privilege the creator had, rather than needing to have GRANT or ADMIN option on 
the privilege being given.

The v8-WIP patch is not a complete replacement for the pre-v8 patches.  It's 
just a balloon I'm floating to try out candidate solutions to some of Stephen's 
objections.  In the long run, I want the solution to Stephen's objections to 
not create problems for anybody who liked the way the pre-v8 patches worked 
(Robert, Andrew, and to some extent me.)

In this WIP patch, for a creator to give *anything* away to a createe, the 
creator must have GRANT or ADMIN on the thing being given.  That includes 
attributes like BYPASSRLS, CREATEDB, LOGIN, etc., and also ADMIN on any role 
the createe is granted into.

I tried to structure things for backwards compatibility, considering which 
things roles with CREATEROLE could give away historically.  It turns out they 
can give away most everything, but not SUPERUSER, BYPASSRLS, or REPLICATION.  
So I structured the default privileges for CREATEROLE to match.  But I'm 
uncertain that design is any good, and your comments below suggest that you 
find it pretty hard to use.

Part of the problem with trying to be backwards compatible is that we must 
break compatibility anyway, to address the problem that historically having 
CREATEROLE meant you effectively had ADMIN on all non-superuser roles.  That's 
got to change.  So in part I'm asking pgsql-hackers if partial backwards 
compatibility is worth the bother.

If we don't go with backwards compatibility, then CREATEROLE would only allow 
you to create a new role, but not to give that role LOGIN, nor CREATEDB, etc.  
You'd need to also have admin option on those things.  To create a role that 
can give those things away, you'd need to run something like:

CREATE ROLE michael
CREATEROLE WITH ADMIN OPTION-- can further give away "createrole"
CREATEDB WITH ADMIN OPTION-- can further give away "createdb"
LOGIN WITH ADMIN OPTION-- can further give away "login"
NOREPLICATION WITHOUT ADMIN OPTION-- this would be implied anyway
NOBYPASSRLS WITHOUT ADMIN OPTION-- this would be implied anyway
CONNECTION LIMIT WITH ADMIN OPTION-- can specify connection limits
PASSWORD WITH ADMIN OPTION-- can specify passwords
VALID UNTIL WITH ADMIN OPTION-- can specify expiration

(I'm on the fence about the phrase "WITH ADMIN OPTION" vs. the phrase "WITH 
GRANT OPTION".)

Even then, when "michael" creates new roles, if he wants to be able to further 
administer those roles, he needs to remember to give himself ADMIN membership 
in that role at creation time.  After the role is created, if he doesn't have 
ADMIN, he can't give it to himself.  So, at create time, he needs to remember 
to do this:

SET ROLE michael;
CREATE ROLE mark ADMIN michael;

But that's still a bit strange, because "ADMIN michael" means that michael can 
grant other roles membership in "mark", not that michael can, for example, 
change mark's password.  If we don't want CREATEROLE to imply that you can mess 
around with arbitrary roles (rather than only roles that you created or have 
been transferred control over) then we need the concept of role ownership.  
This patch doesn't go that far, so for now, only superusers can do those 
things.  Assuming some form of this patch is acceptable, the v9 series will 
resurrect some of the pre-v7 logic for role ownership and say that the owner 
can do those things.

>>> One thing I noticed (and which will likely make DBAs grumpy) is that it
>>> seems being able to create users (as opposed to non-login roles/groups)
>>> depends on when you get the CREATEROLE attribute (on role creation or
>>> later), viz:
>>> 
>>> postgres=# CREATE USER admin CREATEROLE;
>>> CREATE ROLE
>>> postgres=# SET ROLE admin;
>>> SET
>>> postgres=> CREATE USER testuser; -- this works
>>> CREATE ROLE
>>> postgres=> RESET ROLE;
>>> RESET
>>> postgres=# CREATE USER admin2;
>>> CREATE ROLE
>>> postgres=# ALTER ROLE admin2 CREATEROLE; -- we get CREATEROLE after the fact
>>> ALTER ROLE
>>> postgres=# SET ROLE admin2;
>>> SET
>>> postgres=> CREATE USER testuser2; -- bam
>>> ERROR:  must have grant

Re: drop tablespace failed when location contains .. on win32

2022-01-31 Thread Andrew Dunstan



On 1/30/22 16:50, Tom Lane wrote:
> Michael Paquier  writes:
>> In order to make the tests cheap, there is no need to have a separate
>> module in src/test/modules/ as your patch 0002 is doing.  Instead, you
>> should move the C code of your SQL function test_canonicalize_path()
>> to src/test/regress/regress.c, then add some tests in
>> src/test/regress/sql/, with a SQL function created in the test script
>> that feeds from what would be added to regress.so.
> Here's a revised patch version that does it like that.  I also
> reviewed and simplified the canonicalize_path logic.  I think
> this is committable.


LGTM


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: CREATEROLE and role ownership hierarchies

Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Jan 25, 2022, at 12:44 PM, Stephen Frost  wrote:
> > I agree that CREATEROLE is overpowered and that the goal of this should
> > be to provide a way for roles to be created and dropped that doesn't
> > give the user who has that power everything that CREATEROLE currently
> > does.
> 
> I'm attaching a patch that attempts to fix CREATEROLE without any connection 
> to role ownership.

Alright.

> >  The point I was making is that the concept of role ownership
> > isn't intrinsically linked to that and is, therefore, as you say, gravy.
> 
> I agree, they aren't intrinsically linked, though the solution to one might 
> interact in some ways with the solution to the other.

Sure.

> > That isn't to say that I'm entirely against the role ownership idea but
> > I'd want it to be focused on the goal of providing ways of creating and
> > dropping users and otherwise performing that kind of administration and
> > that doesn't require the specific change to make owners be members of
> > all roles they own and automatically have all privileges of those roles
> > all the time.
> 
> The attached WIP patch attempts to solve most of the CREATEROLE problems but 
> not the problem of which role who can drop which other role.  That will 
> likely require an ownership concept.

Yeah, we do need to have a way to determine who is allowed to drop
roles and role ownership seems like it's one possible approach to that.

> The main idea here is that having CREATEROLE doesn't give you ADMIN on roles, 
> nor on role attributes.  For role attributes, the syntax has been extended.  
> An excerpt from the patch's regression test illustrates some of that concept:
> 
> -- ok, superuser can create a role that can create login replication users, 
> but
> -- cannot itself login, nor perform replication
> CREATE ROLE regress_role_repladmin
> CREATEROLE WITHOUT ADMIN OPTION -- can create roles, but cannot give 
> it away
> NOCREATEDB WITHOUT ADMIN OPTION -- cannot create db, nor give it away
> NOLOGIN WITH ADMIN OPTION   -- cannot log in, but can give it away
> NOREPLICATION WITH ADMIN OPTION -- cannot replicate, but can give it 
> away
> NOBYPASSRLS WITHOUT ADMIN OPTION;   -- cannot bypassrls, nor give it away
> 
> -- ok, superuser can create a role with CREATEROLE but restrict give-aways
> CREATE ROLE regress_role_minoradmin
> NOSUPERUSER -- WITHOUT ADMIN OPTION is implied
> CREATEROLE WITHOUT ADMIN OPTION
> NOCREATEDB WITHOUT ADMIN OPTION
> NOLOGIN WITHOUT ADMIN OPTION
> NOREPLICATION   -- WITHOUT ADMIN OPTION is implied
> NOBYPASSRLS -- WITHOUT ADMIN OPTION is implied
> NOINHERIT WITHOUT ADMIN OPTION
> CONNECTION LIMIT NONE WITHOUT ADMIN OPTION
> VALID ALWAYS WITHOUT ADMIN OPTION
> PASSWORD NULL WITHOUT ADMIN OPTION;

Right, this was one of the approaches that I was thinking could work for
managing role attributes and it's very similar to roles and the admin
option for them.  As I suggested at least once, another possible
approach could be to have login users not be able to create roles but
for them to be able to SET ROLE to a role which is able to create roles,
and then, using your prior method, only allow the attributes which that
role has to be able to be given to other roles.  That essentially makes
a role be a proxy for the per-attribute admin options.  There's pros and
cons for each approach and so I'm curious as to which you feel is the
better approach?  I get the feeling that you're more inclined to go with
the approach of having an admin option for each role attribute (having
written this WIP patch) but I'm not sure if that is because you
contempltaed both and felt this was better for some reason or more
because I wasn't explaining the other approach very well, or if there
was some other reason.

> -- fail, having CREATEROLE is not enough to create roles in privileged roles
> SET SESSION AUTHORIZATION regress_role_minoradmin;
> CREATE ROLE regress_nosuch_read_all_data IN ROLE pg_read_all_data;
> ERROR:  must have admin option on role "pg_read_all_data"

I would say not just privileged roles, but any roles that the user
doesn't have admin rights on.

> Whether "WITH ADMIN OPTION" or "WITHOUT ADMIN OPTION" is implied hinges on 
> whether the role is given CREATEROLE.  That hackery is necessary to preserve 
> backwards compatibility.  If we don't care about compatibility, I could 
> change the patch to make "WITHOUT ADMIN OPTION" implied for all attributes 
> when not specified.

Given the relative size of the changes we're talking about regarding
CREATEROLE, I don't really think we need to stress about backwards
compatibility too much.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: RFC: Logging plan of the running query

2022-01-31 Thread Fujii Masao





On 2022/01/28 17:45, torikoshia wrote:

There is the case where the request to log a query plan is skipped
even while the target backend is running a query. If this happens,
users can just retry pg_log_query_plan(). These things should be
documented?


Agreed.
Added following:

   +    Note that there is the case where the request to log a query
   +    plan is skipped even while the target backend is running a
   +    query due to lock conflict avoidance.
   +    If this happens, users can just retry pg_log_query_plan().


This may cause users to misunderstand that pg_log_query_plan() fails while the target 
backend is holding *any* locks? Isn't it better to mention "page-level locks", 
instead? So how about the following?

--
Note that the request to log the query plan will be ignored if it's received 
during a short period while the target backend is holding a page-level lock.
--



+    ereport(LOG_SERVER_ONLY,
+    errmsg("backend with PID %d is holding a page lock. Try again",
+    MyProcPid));

It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?

LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being held
HINT: Try pg_log_query_plan() after a few 


Agreed.
I felt the HINT message 'after a few ...' is difficult to describe, and wrote 
as below.

| HINT: Retrying pg_log_query_plan() might succeed since the lock duration of 
page level locks are usually short

How do you think?


Or we don't need HINT message?


+   errmsg("could not log the query plan"),
+   errdetail("query plan cannot be logged while page 
level lock is being held"),

In detail message, the first word of sentences should be capitalized. How about 
"Cannot log the query plan while holding page-level lock.", instead?


Thanks for updating the patch! Here are some review comments.

+  
+   
+
+ pg_log_query_plan
+

This entry is placed before one for pg_log_backend_memory_contexts(). But it 
should be *after* that since those entries seem to be placed in alphabetical 
order in the table?


+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.

Other descriptions about logging of query string seem not to mention something like 
"untruncated query string". For example, auto_explain, log_statement, etc. Why do we need 
to mention "along with the untruncated query string" specially for pg_log_query_plan()?


+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is 
logged.

Now the plan of even nested statement can be logged. So this description needs 
to be updated?


@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
 
 	MemoryContextSwitchTo(oldcontext);
 
+	ActiveQueryDesc = NULL;


ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be reset to 
NULL in standard_ExecutorFinish()?


Currently even during ProcessLogQueryPlanInterrupt(), pg_log_query_plan() can be call and 
another ProcessLogQueryPlanInterrupt() can be executed. So repeatable re-entrances to 
ProcessLogQueryPlanInterrupt() could cause "stack depth limit exceeded" error. 
To avoid this, shouldn't we make ProcessLogQueryPlanInterrupt() do nothing and return 
immediately, if it's called during another ProcessLogQueryPlanInterrupt()?

pg_log_backend_memory_contexts() also might have the same issue.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: DELETE CASCADE

Hi,

On Mon, Jan 31, 2022 at 09:14:21AM -0600, David Christensen wrote:
> 
> Sounds good; when I get time to look at this again I will resubmit (if
> people think the base functionality is worth it, which is still a topic of
> discussion).

Yes, please do!  Sorry I should have mentioned it, if a patch is closed as
Return with Feedback it means that the feature is wanted, which was my
understanding backlogging this thread, so submitting in some later commit fest
is expected.

Re: Multiple Query IDs for a rewritten parse tree

2022-01-31 Thread Dmitry Dolgov

> On Mon, Jan 31, 2022 at 02:59:17PM +0500, Andrey V. Lepikhov wrote:
> On 1/28/22 9:51 PM, Dmitry Dolgov wrote:
> > > On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
> > > Registration of an queryId generator implemented by analogy with 
> > > extensible
> > > methods machinery.
> >
> > Why not more like suggested with stakind and slots in some data
> > structure? All of those generators have to be iterated anyway, so not
> > sure if a hash table makes sense.
> Maybe. But it is not obvious. We don't really know, how many extensions
> could set an queryId.
> For example, adaptive planning extensions definitely wants to set an unique
> id (for example, simplistic counter) to trace specific {query,plan} across
> all executions (remember plancache too). And they would register a personal
> generator for such purpose.

I don't see how the number of extensions justify a hash table? I mean,
all of the generators will still most likely be executed at once (not on
demand) and store the result somewhere.

In any way I would like to remind, that this functionality wants to be
on the pretty much hot path, which means strong evidences of no
significant performance overhead are needed. And sure, there could be
multiple queryId consumers, but in the vast majority of cases only one
queryId will be generated.

> > > 2. We need a custom queryId, that is based on a generated queryId 
> > > (according
> > > to the logic of pg_stat_statements).
> >
> > Could you clarify?
> pg_stat_statements uses origin queryId and changes it for a reason
> (sometimes zeroed it, sometimes not).

IIRC it does that only for utility statements. I still fail to see the
problem, why would a custom extension not register a new generator if it
needs something different?

> > > 5. We should add an EXPLAIN hook, to allow an extension to print this 
> > > custom
> > > queryId.
> >
> > Why? It would make sense if custom generation code will be generating
> > some complex structure, but the queryId itself is still a hash.
> >
> Extension can print not only queryId, but an explanation of a kind, maybe
> additional logic.
> Moreover why an extension can't show some useful monitoring data, collected
> during an query execution, in verbose mode?

A good recommendation which pops up every now and then in hackers, is to
concentrace on what is immediately useful, leaving "maybe useful" things
for later. I have a strong feeling this is the case here.

Re: Support for NSS as a libpq TLS backend

Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 28 Jan 2022, at 15:30, Robert Haas  wrote:
> > On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson  wrote:
> >>> Kinda makes me question the wisdom of starting to depend on NSS. When 
> >>> openssl
> >>> docs are vastly outshining a library's, that library really should start 
> >>> to
> >>> ask itself some hard questions.
> > 
> > Yeah, OpenSSL is very poor, so being worse is not good.
> > 
> >> Sadly, there is that.  While this is not a new problem, Mozilla has been 
> >> making
> >> some very weird decisions around NSS governance as of late.  Another data 
> >> point
> >> is the below thread from libcurl:
> >> 
> >>https://curl.se/mail/lib-2022-01/0120.html
> > 
> > I would really, really like to have an alternative to OpenSSL for PG.
> > I don't know if this is the right thing, though. If other people are
> > dropping support for it, that's a pretty bad sign IMHO. Later in the
> > thread it says OpenLDAP have dropped support for it already as well.
> 
> I'm counting this and Andres' comment as a -1 on the patchset, and given where
> we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
> objects.

I agree that it's concerning to hear that OpenLDAP dropped support for
NSS... though I don't seem to be able to find any information as to why
they decided to do so.  NSS is clearly still supported and maintained
and they do seem to understand that they need to work on the
documentation situation and to get that fixed (the current issue seems
to be around NSS vs. NSPR and the migration off of MDN to the in-tree
documentation as Daniel mentioned, if I followed the discussion
correctly in the bug that was filed by the curl folks and was then
actively responded to by the NSS/NSPR folks), which seems to be the main
issue that's being raised about it by the curl folks and here.

I'm also very much a fan of having an alternative to OpenSSL and the
NSS/NSPR license fits well for us, unlike the alternatives to OpenSSL
used by other projects, such as GnuTLS (which is the alternative to
OpenSSL that OpenLDAP now has) or other libraries like wolfSSL.

Beyond the documentation issue, which I agree is a concern but also
seems to be actively realized as an issue by the NSS/NSPR folks, is
there some other reason that the curl folks are thinking of dropping
support for it?  Or does anyone have insight into why OpenLDAP decided
to remove support?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: plperl on windows

2022-01-31 Thread Andrew Dunstan



On 10/4/21 18:02, Andres Freund wrote:
> Hi,
>
> On 2021-10-04 14:38:16 -0700, Andres Freund wrote:
>> 2) For some reason src/tools/install.pl doesn't install plperl[u].control,
>>plperl[u]--1.0.sql - But apparently the buildfarm doesn't have that issue,
>>because drongo successfully ran the plperl tests?
> Oh, figured that one out: Install.pm checks the current directory for
> config.pl - but my invocation was from the source tree root (which is
> supported for most things). Because of that it skipped installing plperl, as
> it though it wasn't installed.


We should fix that, maybe along these lines?


iff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index 8de79c618c..75e91f73b3 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -59,6 +59,8 @@ sub Install
    our $config = shift;
    unless ($config)
    {
+   # we expect config.pl and config_default.pl to be here
+   chdir 'src/tools/msvc' if -d 'src/tools/msvc';
 
    # suppress warning about harmless redeclaration of $config
    no warnings 'misc';


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: DELETE CASCADE

2022-01-31 Thread David Christensen

On Sun, Jan 30, 2022 at 9:47 PM Julien Rouhaud  wrote:

> Hi,
>
> On Tue, Jan 25, 2022 at 10:26:53PM +0800, Julien Rouhaud wrote:
> >
> > It's been almost 4 months since your last email, and almost 2 weeks
> since the
> > notice that this patch doesn't apply anymore.  Without update in the next
> > couple of days this patch will be closed as Returned with Feedback per
> the
> > commitfest rules.
>
> Closed.
>

Sounds good; when I get time to look at this again I will resubmit (if
people think the base functionality is worth it, which is still a topic of
discussion).

David

Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostg...@gmail.com) wrote:
> On Fri, Jan 28, 2022 at 11:16 AM Nathan Bossart
>  wrote:
> > I know I voted for "start=%X/%X, end=%X/%X," but looking at this again, I
> > wonder if it could be misleading.  "start" is the redo location, and "end"
> > is the location of the checkpoint record, but I could understand why
> > someone might think that "start" is the location of the previous checkpoint
> > record and "end" is the redo location of the new one.  I think your
> > original idea of "lsn=%X/%X, redo lsn=%X/%X" could be a good alternative.
> >
> > Іn any case, this patch is small and otherwise looks reasonable to me, so I
> > am going to mark it as ready-for-committer.
> 
> Thanks for your review. In summary, we have these options to choose
> checkpoint LSN and last REDO LSN:
> 
> 1)  "start=%X/%X, end=%X/%X" (ControlFile->checkPointCopy.redo,
> ControlFile->checkPoint)
> 2)  "lsn=%X/%X, redo lsn=%X/%X"
> 3) "location=%X/%X, REDO location=%X/%X" -- similar to what
> pg_controldata and pg_control_checkpoint shows currently.
> 4) "location=%X/%X, REDO start location=%X/%X"
> 
> I will leave that decision to the committer.

I'd also vote for #2.  Regarding 3 and 4, I'd argue that those should
have been changed when we changed a number of other things from the
generic 'location' to be 'lsn' in system views and functions, and
therefore we should go change those to also specify 'lsn' rather than
just saying 'location'.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Make relfile tombstone files conditional on WAL level

2022-01-31 Thread Dilip Kumar

On Mon, Jan 31, 2022 at 7:36 PM Robert Haas  wrote:
>
> On Mon, Jan 31, 2022 at 9:04 AM Robert Haas  wrote:
> > On Mon, Jan 31, 2022 at 12:29 AM Dilip Kumar  wrote:
> > > the main one currently we do not have uint8 data
> > > type only int8 is there so I have used int8 for storing relfilenode +
> > > forknumber.
> >
> > I'm confused. We use int8 in tons of places, so I feel like it must exist.
>
> Rather, we use uint8 in tons of places, so I feel like it must exist.

Hmm, at least pg_type doesn't have anything with a name like uint8.

postgres[101702]=# select oid, typname from pg_type where typname like '%int8';
 oid  | typname
--+-
   20 | int8
 1016 | _int8
(2 rows)

postgres[101702]=# select oid, typname from pg_type where typname like '%uint%';
 oid | typname
-+-
(0 rows)

I agree that we are using 8 bytes unsigned int multiple places in code
as uint64.  But I don't see it as an exposed data type and not used as
part of any exposed function.  But we will have to use the relfilenode
in the exposed c function e.g.
binary_upgrade_set_next_heap_relfilenode().

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

On Fri, Jan 28, 2022 at 9:19 PM Michael Paquier  wrote:
> As of the issues of this thread, we really have two things to think
> about:
> 1) How do we want to control the access of some parameters in a
> context or another?  One idea would be more control through GUCs, say
> with a set of context-related flags that prevent the read of some
> variables until they are set.  We could encourage the use of
> GetConfigOption() for that.  For MaxBackends, we could add a read-only
> GUC for this purpose.  That's what Andres hinted at upthread, I
> guess.
> 2) How do we deal with unwanted access of shared parameters?  This one
> is not really controllable, is it?  And we are talking about much more
> than MaxBackends.  This could perhaps be addressed with more
> documentation in the headers for the concerned variables, as a first
> step.

I think you are mischaracterizing the problem. MaxBackends is not
max_connections. It is a value which is computed based on the value of
max_connections and various other GUCs. And that's the problem. If you
have a C variable whose value should depend on a single GUC, the GUC
system will generally do what you want automatically. If it doesn't,
you can use the GUC check hook/assign hook machinery to update as many
global variables as you like whenever the GUC itself is updated.
Moreover, a GUC always has some legal and thus halfway reasonable
value. We read postgresql.conf super-early in the startup sequence,
because we know GUCs are super-important, and even before that
settings have bootstrap values which are not usually totally insane.
The bottom line is that if you have a global variable whose value
depends on one GUC, you can probably just read that global variable
and call it good.

The main reason why this doesn't work for MaxBackends is that
MaxBackends depends on the values of multiple GUCs. There is a further
wrinkle too, which is that none of those GUCs can change, and
therefore code does things with the resulting value on the assumption
that they won't change, like size shared-memory data structures.
Therefore, if you read the wrong value, you've got a big problem. So
the real issues here IMHO are about the difficulty of making sure that
(1) when a GUC changes, we update all of the things that depend on it
including things which may also depend on other GUCs and (2) making
sure that values which can't ever change are computed before they are
used.

I don't know what the solution to problem #1 is, but the solution to
problem #2 is simple: make people call a function to get the value
rather than just reading a bare variable. GetConfigOption() is not a
good solution for people aiming to write C code that does useful
things, because it delivers the value as a string, and that is not
what you want. But an accessor function like GetMaxBackends() for a
quantity of this type is wonderful. Depending on the situation, you
might choose to have the accessor function [a] fail an assertion if
the value is not available yet or [b] compute the value if they value
has not yet been computed or [c] do the latter if possible, otherwise
the former. But the fact that you are making code call a function
rather than just read a variable gives you a very strong tool to make
sure that someone can't blindly read a 0 or whatever instead of the
real value.

Therefore, it is my opinion that the proposed patch was pretty much
right on the money and that we ought to get it committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe

> I think you are right, I have removed the message and again introduced
> the Assert() back.
>
In my previous version of patch, this was a problem, basically, there should
not be an assert as the code is still reachable be it server-lz4 or
client-lz4.
I removed the assert and added the level range check there similar to gzip.

Regards,
Jeevan Ladhe


v12-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data

Re: Add connection active, idle time to pg_stat_activity

2022-01-31 Thread Sergey Dudoladov

Hi,

Thank you for the reviews.

> > The write operation to beentry needs to be enclosed by
> > PGSTAT_BEGIN/END_WRITE_ACTIVITY().  In that perspective, it would be
> > better to move that writes to the PGSTAT_WRITE_ACTIVITY section just
> > below.

I have fixed it in the new version.

> >   if (beentry->st_state == STATE_RUNNING ||
> >   beentry->st_state == STATE_FASTPATH)
> > - pgstat_count_conn_active_time((PgStat_Counter) secs * 
> > 100 + usecs);
> > + {
> > + pgstat_count_conn_active_time((PgStat_Counter) 
> > usecs_diff);
> > + beentry->st_total_active_time += usecs_diff;
> > + }
> >
> > The two lines operates exactly the same way on variables with slightly
> > different behavior. pgStatActiveTime is reported at transaction end
> > and reset at every tabstat reporting. st_total_active_time is reported
> > immediately and reset at session end. Since we do the latter, the
> > first can be omitted by remembering the last values for the local
> > variables at every reporting.  This needs additional two exporting
>
> Of course it's typo(?) of "values of the shared variables".

Could you please elaborate on this idea ?
So we have pgStatActiveTime and pgStatIdleInTransactionTime ultimately
used to report respective metrics in pg_stat_database.
Now beentry's st_total_active_time / st_total_transaction_idle_time
duplicates this info, so one may get rid of  pgStat*Time counters. Is
the idea to report  instead of them at every tabstat reporting the
difference between the last memorized value of  st_total_*_time and
its current value ?

> > This needs additional two exporting
> > function in pgstatfuncs like pgstat_get_my_queryid so others might
> > think differently.

What would be example functions to look at ?

Regards,
Sergey
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..25290d1260 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -967,6 +967,28 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
additional types.
   
  
+
+ 
+  
+   active_time double precision
+  
+  
+   Time in milliseconds this backend spent in active and
+   fastpath states. This value is
+   updated when a backend switches from these states to a new state.
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time in milliseconds this backend spent in idle in transaction
+   and idle in transaction (aborted) states. This value is
+   updated when a backend switches from these states to a new state.
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3cb69b1f87..e349709c05 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -841,7 +841,9 @@ CREATE VIEW pg_stat_activity AS
 s.backend_xmin,
 S.query_id,
 S.query,
-S.backend_type
+S.backend_type,
+S.active_time,
+S.idle_in_transaction_time
 FROM pg_stat_get_activity(NULL) AS S
 LEFT JOIN pg_database AS D ON (S.datid = D.oid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 2fecf26a2c..31adde5ffe 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -468,6 +468,13 @@ pgstat_beshutdown_hook(int code, Datum arg)
 
 	beentry->st_procpid = 0;	/* mark invalid */
 
+	/*
+	 * Reset per-backend counters so that accumulated values for the current
+	 * backend are not used for future backends.
+	 */
+	beentry->st_total_active_time = 0;
+	beentry->st_total_transaction_idle_time = 0;
+
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 
 	/* so that functions can check if backend_status.c is up via MyBEEntry */
@@ -524,6 +531,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	TimestampTz start_timestamp;
 	TimestampTz current_timestamp;
 	int			len = 0;
+	int64		active_time_diff = 0;
+	int64		transaction_idle_time_diff = 0;
 
 	TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
 
@@ -550,6 +559,9 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_xact_start_timestamp = 0;
 			beentry->st_query_id = UINT64CONST(0);
 			proc->wait_event_info = 0;
+
+			beentry->st_total_active_time = 0;
+			beentry->st_total_transaction_idle_time = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
 		return;
@@ -583,16 +595,33 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	{
 		long		secs;
 		int			usecs;
+		int64		usecs_diff;
 
 		TimestampDifference(beentry->st_state_start_timestamp,
 			current_timestamp,
 			, );
+		usecs_diff = secs * 100

Re: Make relfile tombstone files conditional on WAL level

On Mon, Jan 31, 2022 at 9:04 AM Robert Haas  wrote:
> On Mon, Jan 31, 2022 at 12:29 AM Dilip Kumar  wrote:
> > the main one currently we do not have uint8 data
> > type only int8 is there so I have used int8 for storing relfilenode +
> > forknumber.
>
> I'm confused. We use int8 in tons of places, so I feel like it must exist.

Rather, we use uint8 in tons of places, so I feel like it must exist.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Make relfile tombstone files conditional on WAL level

On Mon, Jan 31, 2022 at 12:29 AM Dilip Kumar  wrote:
> the main one currently we do not have uint8 data
> type only int8 is there so I have used int8 for storing relfilenode +
> forknumber.

I'm confused. We use int8 in tons of places, so I feel like it must exist.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

On Mon, Jan 31, 2022 at 6:11 AM Jeevan Ladhe
 wrote:
> I had an offline discussion with Dipesh, and he will be working on the
> lz4 client side decompression part.

OK. I guess we should also be thinking about client-side LZ4
compression. It's probably best to focus on that before worrying about
ZSTD, even though ZSTD would be really cool to have.

>> - In the new test case you set decompress_flags but according to the
>> documentation I have here, -m is for multiple files (and so should not
>> be needed here) and -d is for decompression (which is what we want
>> here). So I'm confused why this is like this.
>
> As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
> throwing the decompression to stdout. Now, I have removed the '-m',
> added '-d' for decompression, and also added the target file explicitly in
> the command.

I don't see the behavior you describe here. For me:

[rhaas ~]$ lz4 q.lz4
Decoding file q
q.lz4: decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -m q.lz4
[rhaas ~]$ ls q
q
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d q.lz4
Decoding file q
q.lz4: decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d -m q.lz4
[rhaas ~]$ ls q
q

In other words, on my system, the file gets decompressed with or
without -d, and with or without -m. The only difference I see is that
using -m makes it happen silently, without printing anything on the
terminal. Anyway, I wasn't saying that using -m was necessarily wrong,
just that I didn't understand why you had it like that. Now that I'm
more informed, I recommend that we use -d -m, the former to be
explicit about wanting to decompress and the latter because it either
makes it less noisy (on my system) or makes it work at all (on yours).
It's surprising that the command behavior would be different like that
on different systems, but it is what it is. I think any set of flags
we put here is better than adding more logical in perl, as it keeps
things simpler.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Support for NSS as a libpq TLS backend

2022-01-31 Thread Daniel Gustafsson

> On 28 Jan 2022, at 15:30, Robert Haas  wrote:
> 
> On Fri, Jan 28, 2022 at 9:08 AM Daniel Gustafsson  wrote:
>>> Kinda makes me question the wisdom of starting to depend on NSS. When 
>>> openssl
>>> docs are vastly outshining a library's, that library really should start to
>>> ask itself some hard questions.
> 
> Yeah, OpenSSL is very poor, so being worse is not good.
> 
>> Sadly, there is that.  While this is not a new problem, Mozilla has been 
>> making
>> some very weird decisions around NSS governance as of late.  Another data 
>> point
>> is the below thread from libcurl:
>> 
>>https://curl.se/mail/lib-2022-01/0120.html
> 
> I would really, really like to have an alternative to OpenSSL for PG.
> I don't know if this is the right thing, though. If other people are
> dropping support for it, that's a pretty bad sign IMHO. Later in the
> thread it says OpenLDAP have dropped support for it already as well.

I'm counting this and Andres' comment as a -1 on the patchset, and given where
we are in the cycle I'm mark it rejected in the CF app shortly unless anyone
objects.

--
Daniel Gustafsson   https://vmware.com/

Re: row filtering for logical replication

2022-01-31 Thread Amit Kapila

On Mon, Jan 31, 2022 at 1:08 PM Amit Kapila  wrote:
>
> On Mon, Jan 31, 2022 at 7:27 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Monday, January 31, 2022 8:53 AM Peter Smith  
> > wrote:
> > >
> > > PSA v73*.
> > >
> > > (A rebase was needed due to recent changes in tab-complete.c.
> > > Otherwise, v73* is the same as v72*).
> >
> > Thanks for the rebase.
> > Attach the V74 patch set which did the following changes:
> >
>
> Few comments:
> =
>

Few more minor comments:
1.
+ if (relentry->attrmap)
+ {
+ TupleDesc tupdesc  = RelationGetDescr(relation);
+ TupleTableSlot *tmp_slot = MakeTupleTableSlot(tupdesc,
+   );
+
+ new_slot = execute_attr_map_slot(relentry->attrmap,
+ new_slot,
+ tmp_slot);

I think we don't need these additional variables tupdesc and tmp_slot.
You can directly use MakeTupleTableSlot instead of tmp_slot, which
will make this and nearby code look better.

2.
+ if (pubrinfo->pubrelqual)
+ appendPQExpBuffer(query, " WHERE (%s)", pubrinfo->pubrelqual);
+ appendPQExpBufferStr(query, ";\n");

Do we really need additional '()' for rwo filter expression here? See
the below output from pg_dump:

ALTER PUBLICATION pub1 ADD TABLE ONLY public.t1 WHERE ((c1 < 100));

3.
+ /* row filter (if any) */
+ if (pset.sversion >= 15)
+ {
+ if (!PQgetisnull(result, i, 1))
+ appendPQExpBuffer(, " WHERE %s", PQgetvalue(result, i, 1));
+ }

I don't think we need this version check if while forming query we use
NULL as the second column in the corresponding query for v < 15.

-- 
With Regards,
Amit Kapila.

Re: logical replication empty transactions

2022-01-31 Thread Ajin Cherian

On Sun, Jan 30, 2022 at 7:04 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, January 27, 2022 9:57 PM Ajin Cherian  wrote:
> Hi, thanks for your patch update.
>
>
> > On Wed, Jan 26, 2022 at 8:33 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 11, 2022 6:43 PM Ajin Cherian 
> > wrote:
> > > (3) Is this patch's reponsibility to intialize the data in
> > pgoutput_begin_prepare_txn ?
> > >
> > > @@ -433,6 +487,8 @@ static void
> > >  pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
> > > ReorderBufferTXN *txn)  {
> > > boolsend_replication_origin = txn->origin_id !=
> > InvalidRepOriginId;
> > > +   PGOutputTxnData*txndata =
> > MemoryContextAllocZero(ctx->context,
> > > +
> > > + sizeof(PGOutputTxnData));
> > >
> > > OutputPluginPrepareWrite(ctx, !send_replication_origin);
> > > logicalrep_write_begin_prepare(ctx->out, txn);
> > >
> > >
> > > Even if we need this initialization for either non streaming case or
> > > non two_phase case, there can be another issue.
> > > We don't free the allocated memory for this data, right ?
> > > There's only one place to use free in the entire patch, which is in
> > > the pgoutput_commit_txn(). So, corresponding free of memory looked
> > > necessary in the two phase commit functions.
> > >
> >
> > Actually it is required for begin_prepare to set the data type, so that the 
> > checks
> > in the pgoutput_change can make sure that the begin prepare is sent. I've 
> > also
> > added a free in commit_prepared code.
> Okay, but if we choose the design that this patch takes
> care of the initialization in pgoutput_begin_prepare_txn(),
> we need another free in pgoutput_rollback_prepared_txn().
> Could you please add some codes similar to pgoutput_commit_prepared_txn() to 
> the same ?
> If we simply execute rollback prepared for non streaming transaction,
> we don't free it.
>

Fixed.

>
> Some other new minor comments.
>
> (a) can be "synchronous replication", instead of "Synchronous Replication"
>
> When we have a look at the syncrep.c, we use the former usually in
> a normal comment.
>
>  /*
> + * Check if Synchronous Replication is enabled
> + */

Fixed.

>
> (b) move below pgoutput_truncate two codes to the case where if nrelids > 0.
>
> @@ -770,6 +850,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn,
>   int nrelations, Relation relations[], 
> ReorderBufferChange *change)
>  {
> PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
> +   PGOutputTxnData *txndata = (PGOutputTxnData *) 
> txn->output_plugin_private;
> MemoryContext old;
> RelationSyncEntry *relentry;
> int i;
> @@ -777,6 +858,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn,
> Oid*relids;
> TransactionId xid = InvalidTransactionId;
>
> +   /* If not streaming, should have setup txndata as part of BEGIN/BEGIN 
> PREPARE */
> +   Assert(in_streaming || txndata);
> +
>

Fixed.

> (c) fix indent with spaces (for the one sentence of SyncRepEnabled)
>
> @@ -539,6 +538,15 @@ SyncRepReleaseWaiters(void)
>  }
>
>  /*
> + * Check if Synchronous Replication is enabled
> + */
> +bool
> +SyncRepEnabled(void)
> +{
> +return SyncRepRequested() && ((volatile WalSndCtlData *) 
> WalSndCtl)->sync_standbys_defined;
> +}
> +
> +/*
>
> This can be detected by git am.
>

Fixed.

regards,
Ajin Cherian
Fujitsu Australia


v18-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data

Re: drop tablespace failed when location contains .. on win32

Michael Paquier  writes:
> Should we have tests for WIN32 (aka for driver letters and "//")?
> This could be split into its own separate test file to limit the
> damage with the alternate outputs, and the original complain was from
> there.

I thought about it and concluded that the value couldn't justify
the pain-in-the-neck factor of adding a platform-specific variant
result file.  skip_drive() is pretty simple and decoupled from what
we're trying to test here, plus it hasn't changed in decades and
is unlikely to do so in future.

regards, tom lane

Re: Make mesage at end-of-recovery less scary.

2022-01-31 Thread Pavel Borisov

>
> > This v8 is changed in...
> >
> > - Added tests to 011_crash_recovery.pl
> >
> > - Fixed a bug that server emits "end-of-wal" messages even if it have
> >   emitted an error message for the same LSN.
> >
> > - Changed XLogReaderValidatePageHeader() so that it recognizes an
> >   empty page as end-of-WAL.
> >
> > - Made pg_waldump conscious of end-of-wal.
> >
> > While doing the last item, I noticed that pg_waldump shows the wrong
> > LSN as the error position.  Concretely it emits the LSN of the last
> > sound WAL record as the error position.  I will post a bug-fix patch
> > for the issue after confirmation.
>
> I noticed that I added a useless error message "garbage record
> header", but it is a kind of invalid record length.  So I removed the
> message. That change makes the logic for EOW in ValidXLogRecordHeader
> and XLogReaderValidatePageHeader share the same flow.
>

Hi,  Kyotaro!

I don't quite understand a meaning of a comment:
 /* it is completely zeroed, call it a day  */

Please also run pgindent on your code.

Otherwise the new patch seems ok.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: refactoring basebackup.c

2022-01-31 Thread Jeevan Ladhe

Hi Robert,

I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.

Please find the attached patch with the following changes:

- Even if we were going to support LZ4 only on the server side, surely
> it's not right to refuse --compress lz4 and --compress client-lz4 at
> the parsing stage. I don't even think the message you added to main()
> is reachable.
>

I think you are right, I have removed the message and again introduced
the Assert() back.

- In the new test case you set decompress_flags but according to the
> documentation I have here, -m is for multiple files (and so should not
> be needed here) and -d is for decompression (which is what we want
> here). So I'm confused why this is like this.
>

As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.

Regards,
Jeevan Ladhe


v11-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch
Description: Binary data

Re: Multiple Query IDs for a rewritten parse tree

2022-01-31 Thread Andrey V. Lepikhov

On 1/28/22 9:51 PM, Dmitry Dolgov wrote:

On Fri, Jan 21, 2022 at 11:33:22AM +0500, Andrey V. Lepikhov wrote:
Registration of an queryId generator implemented by analogy with extensible
methods machinery.

Why not more like suggested with stakind and slots in some data
structure? All of those generators have to be iterated anyway, so not
sure if a hash table makes sense.
Maybe. But it is not obvious. We don't really know, how many extensions
could set an queryId.
For example, adaptive planning extensions definitely wants to set an
unique id (for example, simplistic counter) to trace specific
{query,plan} across all executions (remember plancache too). And they
would register a personal generator for such purpose.

Also, I switched queryId to int64 type and renamed to
'label'.

A name with "id" in it would be better I believe. Label could be think
of as "the query belongs to a certain category", while the purpose is
identification.
I think, it is not a full true. Current jumbling generates not unique
queryId (i hope, intentionally) and pg_stat_statements uses queryId to
group queries into classes.
For tracking specific query along execution path it performs additional
efforts (to remember nesting query level, as an example).
BTW, before [1], I tried to improve queryId, that can be stable for
permutations of tables in 'FROM' section and so on. It would allow to
reduce a number of pg_stat_statements entries (critical factor when you
use an ORM, like 1C for example).

So, i think queryId is an Id and a category too.

2. We need a custom queryId, that is based on a generated queryId (according
to the logic of pg_stat_statements).

Could you clarify?
pg_stat_statements uses origin queryId and changes it for a reason
(sometimes zeroed it, sometimes not). So you can't use this value in
another extension and be confident that you use original value,
generated by JumbleQuery(). Custom queryId allows to solve this problem.

4. We should reserve position of default in-core generator

From the discussion above I was under the impression that the core
generator should be distinguished by a predefined kind.
Yes, but I think we should have a range of values, enough for use in
third party extensions.

5. We should add an EXPLAIN hook, to allow an extension to print this custom
queryId.

Why? It would make sense if custom generation code will be generating
some complex structure, but the queryId itself is still a hash.

Extension can print not only queryId, but an explanation of a kind,
maybe additional logic.
Moreover why an extension can't show some useful monitoring data,
collected during an query execution, in verbose mode?

[1]
https://www.postgresql.org/message-id/flat/e50c1e8f-e5d6-5988-48fa-63dd992e9565%40postgrespro.ru

--
regards,
Andrey Lepikhov
Postgres Professional

Re: CREATEROLE and role ownership hierarchies

2022-01-31 Thread Michael Banck

Hi,

Am Sonntag, dem 30.01.2022 um 17:11 -0800 schrieb Mark Dilger:
> > On Jan 30, 2022, at 2:38 PM, Michael Banck <   
> > michael.ba...@credativ.de> wrote:
> > > The attached WIP patch attempts to solve most of the CREATEROLE
> 
> I'm mostly looking for whether the general approach in this Work In
> Progress patch is acceptable, so I was a bit sloppy with whitespace
> and such

Ok, sure. I think this topic is hugely important and as I read the
patch anyway, I added some comments, but yeah, we need to figure out
the fundamentals first.
> 

> > One thing I noticed (and which will likely make DBAs grumpy) is that it
> > seems being able to create users (as opposed to non-login roles/groups)
> > depends on when you get the CREATEROLE attribute (on role creation or
> > later), viz:
> > 
> > postgres=# CREATE USER admin CREATEROLE;
> > CREATE ROLE
> > postgres=# SET ROLE admin;
> > SET
> > postgres=> CREATE USER testuser; -- this works
> > CREATE ROLE
> > postgres=> RESET ROLE;
> > RESET
> > postgres=# CREATE USER admin2;
> > CREATE ROLE
> > postgres=# ALTER ROLE admin2 CREATEROLE; -- we get CREATEROLE after the fact
> > ALTER ROLE
> > postgres=# SET ROLE admin2;
> > SET
> > postgres=> CREATE USER testuser2; -- bam
> > ERROR:  must have grant option on LOGIN privilege to create login users
> > postgres=# SELECT rolname, admcreaterole, admcanlogin FROM
> > pg_authid
> > WHERE rolname LIKE 'admin%';
> > rolname | admcreaterole | admcanlogin 
> > -+---+-
> > admin   | t | t
> > admin2  | f | f
> > (2 rows)
> > 
> > Is that intentional? If it is, I think it would be nice if this
> > could be
> > changed, unless I'm missing some serious security concerns or so. 
> 
> It's intentional, but part of what I wanted review comments about. 
> The issue is that historically:
> 
>   CREATE USER michael CREATEROLE
> 
> meant that you could go on to do things like create users with LOGIN
> privilege.  I could take that away, which would be a backwards
> compatibility break, or I can do the weird thing this patch does.  Or
> I could have your
> 
>   ALTER ROLE admin2 CREATEROLE;
> 
> also grant the other privileges like LOGIN unless you explicitly say
> otherwise with a bunch of explicit WITHOUT ADMIN OPTION clauses. 
> Finding out which of those this is preferred was a big part of why I
> put this up for review.  Thanks for calling it out in under 24 hours!

Ok, so what I would have needed to do in the above in order to have
"admin2" and "admin" be the same as far as creating login users is (I
believe):

ALTER ROLE admin2 CREATEROLE LOGIN WITH ADMIN OPTION;

I think if possible, it would be nice to just have this part as default
if possible. I.e. CREATEROLE and HASLOGIN are historically so much
intertwined that I think the above should be implicit (again, if that
is possible); I don't care and/or haven't made up my mind about any of
the other options so far...

Ok, so now that I had another look, I see we are going down Pandora's
box: For any of the other things a role admin would like to do (change
password, change conn limit), one would have to go with this weird
disconnect between CREATE USER admin CREATEROLE and ALTER USER admin2
CREATEROLE [massive list of WITH ADMIN OPTION], and then I'm not sure
where we stop.

By the way, is there now even a way to add admpassword to a role after
it got created?

postgres=# SET ROLE admin2;
SET
postgres=> \password test
Enter new password for user "test": 
Enter it again: 
ERROR:  must have admin on password to change password attribute
postgres=> RESET ROLE;
RESET
postgres=# ALTER ROLE admin2 PASSWORD WITH ADMIN OPTION;
ERROR:  syntax error at or near "WITH"
UPDATE pg_authid SET admpassword = 't' WHERE rolname = 'admin2';
UPDATE 1
postgres=# SET ROLE admin2;
SET
postgres=> \password test
Enter new password for user "test": 
Enter it again: 
postgres=> 

However, the next thing is:

postgres=# SET ROLE admin;
SET
postgres=> CREATE GROUP testgroup;
CREATE ROLE
postgres=> GRANT testgroup TO test;
ERROR:  must have admin option on role "testgroup"

First off, what does "admin option" mean on a role?

I then tried this:

postgres=# CREATE USER admin3 CREATEROLE WITH ADMIN OPTION;
CREATE ROLE
postgres=# SET ROLE admin3;
SET
postgres=> CREATE USER test3;
CREATE ROLE
postgres=> CREATE GROUP testgroup3;
CREATE ROLE
postgres=> GRANT testgroup3 TO test3;
ERROR:  must have admin option on role "testgroup3"

So I created both user and group, I have the CREATEROLE priv (with or
without admin option), but I still can't assign the group. Is that
(tracking who created a role and letting the creator do more thing) the
part that got chopped away in your last patch in order to find a common
ground?

Is there now any way non-Superusers can assign groups to other users? I
feel this (next to creating users/groups) is the primary thing those
CREATEROLE admins are supposed to do/where doing up to now.


Again, sorry if this was all discussed

Re: [PATCH] Accept IP addresses in server certificate SANs