Re: vac_truncate_clog()'s bogus check leads to bogusness

2023-06-22 Thread Noah Misch
On Thu, Jun 22, 2023 at 09:45:18AM -0700, Andres Freund wrote:
> On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> > On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> > > When vac_truncate_clog() returns early
> > ...
> > > we haven't released the lwlock that we acquired earlier
> > 
> > > Until there's some cause for the session to call LWLockReleaseAll(), the 
> > > lock
> > > is held. Until then neither the process holding the lock, nor any other
> > > process, can finish vacuuming.  We don't even have an assert against a
> > > self-deadlock with an already held lock, oddly enough.
> > 
> > I agree with this finding.  Would you like to add the lwlock releases, or
> > would you like me to?
> 
> Happy with either.  I do have code and testcase, so I guess it would make
> sense for me to do it?

Sounds good.  Thanks.




Re: Improve logging when using Huge Pages

2023-06-22 Thread Michael Paquier
On Tue, Jun 20, 2023 at 06:44:20PM -0500, Justin Pryzby wrote:
> On Tue, Jun 13, 2023 at 02:50:30PM +0900, Michael Paquier wrote:
>> On Mon, Jun 12, 2023 at 02:37:15PM -0700, Nathan Bossart wrote:
>> > Fair enough.  I know I've been waffling in the GUC versus function
>> > discussion, but FWIW v7 of the patch looks reasonable to me.
>> 
>> +   Assert(strcmp("unknown", GetConfigOption("huge_pages_status", false, 
>> false)) != 0);
>> 
>> Not sure that there is anything to gain with this assertion in
>> CreateSharedMemoryAndSemaphores(), because this is pretty much what
>> check_GUC_init() looks after?
> 
> It seems like you misread the assertion, so I added a comment about it.
> Indeed, the assertion addresses the other question you asked later.
> 
> That's what I already commented about, and the reason I found it
> compelling not to use a boolean.

Apologies for the late reply here.

At the end, I am on board with the addition of this assertion and its
position after PGSharedMemoryCreate().

I would also move the SetConfigOption() for the WIN32 path after ew
have passed all the checks.  There are a few FATALs that can be
triggered so it would be a waste to call it if we are going to fail
the shmem creation in this path.

I could not resist adding two checks in the TAP tests to make sure
that we don't report unknown.  Perhaps that's not necessary, but that
would provide coverage in a more broader way, and these are cheap.

I have run one indentation, while on it.

Note to self: check that manually on Windows.
--
Michael
From 2ef2b73215b5775bf6d44fa9181de2cf9a379fc7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 23 Jun 2023 13:56:04 +0900
Subject: [PATCH v9] add GUC: huge_pages_status

This is useful to show the current state of huge pages when
huge_pages=try.  The effective status is not otherwise visible without
OS level tools like gdb or /proc/N/smaps.  Like the other GUCs related
to huge pages, this works for Linux and Windows.
---
 src/include/storage/pg_shmem.h|  5 +++--
 src/backend/port/sysv_shmem.c | 14 ++
 src/backend/port/win32_shmem.c|  5 +
 src/backend/storage/ipc/ipci.c|  7 +++
 src/backend/utils/misc/guc_tables.c   | 19 +++
 src/test/authentication/t/003_peer.pl |  6 ++
 src/test/authentication/t/005_sspi.pl |  4 
 doc/src/sgml/config.sgml  | 23 ++-
 8 files changed, 80 insertions(+), 3 deletions(-)

diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 4dd05f156d..ba0cdc13c7 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -46,12 +46,13 @@ extern PGDLLIMPORT int shared_memory_type;
 extern PGDLLIMPORT int huge_pages;
 extern PGDLLIMPORT int huge_page_size;
 
-/* Possible values for huge_pages */
+/* Possible values for huge_pages and huge_pages_status */
 typedef enum
 {
 	HUGE_PAGES_OFF,
 	HUGE_PAGES_ON,
-	HUGE_PAGES_TRY
+	HUGE_PAGES_TRY,/* only for huge_pages */
+	HUGE_PAGES_UNKNOWN			/* only for huge_pages_status */
 }			HugePagesType;
 
 /* Possible values for shared_memory_type */
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index eaba244bc9..f1eb5a1e20 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -627,6 +627,14 @@ CreateAnonymousSegment(Size *size)
 	}
 #endif
 
+	/*
+	 * Report whether huge pages are in use.  This needs to be tracked before
+	 * the second mmap() call if attempting to use huge pages failed
+	 * previously.
+	 */
+	SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on",
+	PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+
 	if (ptr == MAP_FAILED && huge_pages != HUGE_PAGES_ON)
 	{
 		/*
@@ -737,8 +745,14 @@ PGSharedMemoryCreate(Size size,
 		sysvsize = sizeof(PGShmemHeader);
 	}
 	else
+	{
 		sysvsize = size;
 
+		/* huge pages are only available with mmap */
+		SetConfigOption("huge_pages_status", "off",
+		PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+	}
+
 	/*
 	 * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
 	 * ensure no more than one postmaster per data directory can enter this
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 62e0807477..05494c14a9 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -401,6 +401,11 @@ retry:
 	on_shmem_exit(pgwin32_SharedMemoryDelete, PointerGetDatum(hmap2));
 
 	*shim = hdr;
+
+	/* Report whether huge pages are in use */
+	SetConfigOption("huge_pages_status", (flProtect & SEC_LARGE_PAGES) ?
+	"on" : "off", PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+
 	return hdr;
 }
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 8f1ded7338..cc387c00a1 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -190,6 +190,13 @@ CreateSharedMemoryAndSemaphores(void)
 		 */
 		seghdr = PGSharedMemoryCreate(size, );
 
+		/*
+		 * Make sure 

Re: Deleting prepared statements from libpq.

2023-06-22 Thread Michael Paquier
On Tue, Jun 20, 2023 at 01:42:13PM +0200, Jelte Fennema wrote:

Thanks for updating the patch.

> On Tue, 20 Jun 2023 at 06:18, Michael Paquier  wrote:
>> The amount of duplication between the describe and close paths
>> concerns me a bit.  Should PQsendClose() and PQsendDescribe() be
>> merged into a single routine (say PQsendCommand), that uses a message
>> type for pqPutMsgStart and a queryclass as extra arguments?
> 
> Done (I called it PQsendTypedCommand)

Okay by me to use this name.

I have a few comments about the tests.  The docs and the code seem to
be in line.

+   if (PQsendClosePrepared(conn, "select_one") != 1)
+   pg_fatal("PQsendClosePortal failed: %s", PQerrorMessage(conn));
+   if (PQpipelineSync(conn) != 1)
+   pg_fatal("pipeline sync failed: %s", PQerrorMessage(conn));
+
+   res = PQgetResult(conn);
+   if (res == NULL)
+   pg_fatal("expected non-NULL result");
+   if (PQresultStatus(res) != PGRES_COMMAND_OK)
+   pg_fatal("expected COMMAND_OK, got %s", 
PQresStatus(PQresultStatus(res)));
+   PQclear(res);
+   res = PQgetResult(conn);
+   if (res != NULL)
+   pg_fatal("expected NULL result");
+   res = PQgetResult(conn);
+   if (PQresultStatus(res) != PGRES_PIPELINE_SYNC)
+   pg_fatal("expected PGRES_PIPELINE_SYNC, got %s", 
PQresStatus(PQresultStatus(res)));

Okay, so here this checks that a PQresult is returned on the first
call, and that NULL is returned on the second call.  Okay with that.
Perhaps this should add a fprintf(stderr, "closing statement..") at
the top of the test block?  That's a minor point.

+   /*
+* Also test the blocking close, this should not fail since closing a
+* non-existent prepared statement is a no-op
+*/
+   res = PQclosePrepared(conn, "select_one");
+   if (PQresultStatus(res) != PGRES_COMMAND_OK)
+   pg_fatal("expected COMMAND_OK, got %s", 
PQresStatus(PQresultStatus(res)));
[...]
res = PQgetResult(conn);
if (res == NULL)
-   pg_fatal("expected NULL result");
+   pg_fatal("expected non-NULL result");

This should check for the NULL-ness of the result returned for
PQclosePrepared() rather than changing the behavior of the follow-up
calls?

+   if (PQsendClosePortal(conn, "cursor_one") != 1)
+   pg_fatal("PQsendClosePortal failed: %s", PQerrorMessage(conn));
+   if (PQpipelineSync(conn) != 1)
+   pg_fatal("pipeline sync failed: %s", PQerrorMessage(conn));
Perhaps I would add an extra fprint about the portal closing test,
just for clarity of the test flow.

@@ -2534,6 +2615,7 @@ sendFailed:
return 0;
 }
 
+
 /*

Noise diff.
--
Michael


signature.asc
Description: PGP signature


Re: Preventing non-superusers from altering session authorization

2023-06-22 Thread Michał Kłeczek
Hi,

I’ve just stumbled upon this patch and thread and thought I could share an idea 
of adding an optional temporary secret to SET SESSION AUTHORIZATION so that it 
is only possible to RESET SESSION AUTHORIZATION by providing the same secret 
,like:

SET SESSION AUTHORIZATION [role] GUARDED BY ‘[secret]’;

...

RESET SESSION AUTHORIZATION WITH ‘[secret]’;


The use case is: I have a set of Liquibase scripts I would like to execute as a 
different role each and make sure they cannot escape the sandbox.

As I am not a Postgres hacker I wonder how difficult to implement it might be…

Thanks,
Michal

> On 23 Jun 2023, at 00:39, Joseph Koshakow  wrote:
> 
> 
> 
> On Wed, Jun 21, 2023 at 11:48 PM Nathan Bossart  > wrote:
> >
> >On Wed, Jun 21, 2023 at 04:28:43PM -0400, Joseph Koshakow wrote:
> >> + roleTup = SearchSysCache1(AUTHOID, 
> > ObjectIdGetDatum(AuthenticatedUserId));
> >> + if (!HeapTupleIsValid(roleTup))
> >> + ereport(FATAL,
> >> + 
> > (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
> >> + errmsg("role with OID %u 
> > does not exist", AuthenticatedUserId)));
> >> + rform = (Form_pg_authid) GETSTRUCT(roleTup);
> >
> >I think "superuser_arg(AuthenticatedUserId)" would work here.
> 
> Yep, that worked. I've attached a patch with this change.
> 
> > I see that RESET SESSION AUTHORIZATION
> > with a concurrently dropped role will FATAL with your patch but succeed
> > without it, which could be part of the reason.
> 
> That might be a good change? If the original authenticated role ID no
> longer exists then we may want to return an error when trying to set
> your session authorization to that role.
> 
> Thanks,
> Joe Koshakow
> 



Re: Skip collecting decoded changes of already-aborted transactions

2023-06-22 Thread Dilip Kumar
On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada  wrote:
>
> Hi,
>
> In logical decoding, we don't need to collect decoded changes of
> aborted transactions. While streaming changes, we can detect
> concurrent abort of the (sub)transaction but there is no mechanism to
> skip decoding changes of transactions that are known to already be
> aborted. With the attached WIP patch, we check CLOG when decoding the
> transaction for the first time. If it's already known to be aborted,
> we skip collecting decoded changes of such transactions. That way,
> when the logical replication is behind or restarts, we don't need to
> decode large transactions that already aborted, which helps improve
> the decoding performance.
>
+1 for the idea of checking the transaction status only when we need
to flush it to the disk or send it downstream (if streaming in
progress is enabled).   Although this check is costly since we are
planning only for large transactions then it is worth it if we can
occasionally avoid disk or network I/O for the aborted transactions.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Things I don't like about \du's "Attributes" column

2023-06-22 Thread Tom Lane
Nearby I dissed psql's \du command for its incoherent "Attributes"
column [1].  It's too late to think about changing that for v16,
but here's some things I think we should consider for v17:

* It seems weird that some attributes are described in the negative
("Cannot login", "No inheritance").  I realize that this corresponds
to the defaults, so that a user created by CREATE USER with no options
shows nothing in the Attributes column; but I wonder how much that's
worth.  As far as "Cannot login" goes, you could argue that the silent
default ought to be for the properties assigned by CREATE ROLE, since
the table describes itself as "List of roles".  I'm not dead set on
changing this, but it seems like a topic that deserves a fresh look.

* I do not like the display of rolconnlimit, ie "No connections" or
"%d connection(s)".  A person not paying close attention might think
that that means the number of *current* connections the user has.
A minimal fix could be to word it like "No connections allowed" or
"%d connection(s) allowed".  But see below.

* I do not like the display of rolvaliduntil, either.  Consider

regression=# create user alice password 'secret';
CREATE ROLE
regression=# create user bob valid until 'infinity';
CREATE ROLE
regression=# \du
...
 alice |
 bob   | Password valid until infinity
...

This output claims that bob has an indefinitely-valid password, when in
fact he has no password at all.  On the other hand, nothing is said about
alice, who actually *does* have a password valid until infinity.  It's
difficult to imagine a more misleading way to present this.

Now, it's hard to do better given that the \du command is examining the
universally-readable pg_roles view, because that doesn't betray any hint
of whether the user has a password or not.  I wonder though what is the
rationale for letting unprivileged users see the rolvaliduntil column
but not whether a password exists at all.  I suggest that maybe it'd
be okay to change the pg_roles view along the lines of

-   ''::text as rolpassword,
+   case when rolpassword is not null then ''::text end as 
rolpassword,

Then we could fix \du to say nothing if rolpassword is null,
and when it isn't, print "Password valid until infinity" whenever
that is the case (ie, rolvaliduntil is null or infinity).

* On a purely presentation level, how did we possibly arrive
at the idea that the connection-limit and valid-until properties
deserve their own lines in the Attributes column while the other
properties are comma-separated?  That makes no sense whatsoever,
nor does it look nice in \x display format.

I do grasp the distinction that the other properties are permission
bits while these two aren't, but that doesn't naturally lead to
this formatting.  I'd vote for either

(a) each property gets its own line, or

(b) move these two things into separate columns.  Some of the
verbiage could then be dropped in favor of the column title.

Right now (b) would lead to an undesirably wide table; but
if we push the "Member of" column out to a different \d command
as discussed in the other thread, maybe it'd be practical.

Anyway, for now I'm just throwing this topic out for discussion.

regards, tom lane

[1] https://www.postgresql.org/message-id/4128935.1687478926%40sss.pgh.pa.us




Re: psql: Add role's membership options to the \du+ command

2023-06-22 Thread Tom Lane
"Jonathan S. Katz"  writes:
> On 6/15/23 2:47 PM, David G. Johnston wrote:
>> Robert - can you please comment on what you are willing to commit in 
>> order to close out your open item here.  My take is that the design for 
>> this, the tabular form a couple of emails ago (copied here), is 
>> ready-to-commit, just needing the actual (trivial) code changes to be 
>> made to accomplish it.

> Can we resolve this before Beta 2?[1] The RMT originally advised to try 
> to resolve before Beta 1[2], and this seems to be lingering.

At this point I kinda doubt that we can get this done before beta2
either, but I'll put in my two cents anyway:

* I agree that the "tabular" format looks nicer and has fewer i18n
issues than the other proposals.

* Personally I could do without the "empty" business, but that seems
unnecessary in the tabular format; an empty column will serve fine.

* I also agree with Pavel's comment that we'd be better off taking
this out of \du altogether and inventing a separate \d command.
Maybe "\drg" for "display role grants"?

* Parenthetically, the "Attributes" column of \du is a complete
disaster, lacking not only conceptual but even notational consistency.
(Who decided that some items belonged on their own line and others
not?)  I suppose it's way too late to redesign that for v16.  But
I think we'd have more of a free hand to clean that up if we weren't
trying to shoehorn role grants into the same display.

regards, tom lane




Re: Make pgbench exit on SIGINT more reliably

2023-06-22 Thread Michael Paquier
On Thu, Jun 22, 2023 at 02:58:14PM +0900, Yugo NAGATA wrote:
> On Mon, 19 Jun 2023 16:49:05 -0700
> "Tristan Partin"  wrote:
>> On Mon Jun 19, 2023 at 6:39 AM PDT, Yugo NAGATA wrote:
>>> [1] 
>>> https://www.postgresql.org/message-id/flat/CSTU5P82ONZ1.19XFUGHMXHBRY%40c3po
>> 
>> The other patch does not replace this one. They are entirely separate.
> 
> After applying the other patch, CancelRequested would be checked enough 
> frequently
> even during initialization of pgbench_branches and pgbench_tellers, and it 
> will
> allow the initialization step to be cancelled immediately even if the scale 
> factor
> is high. So, is it unnecessary to modify setup_cancel_handler anyway?
> 
> I think it would be still nice to introduce a new exit code for query cancel 
> separately.

I have the same impression as Nagata-san while going again through
the proposal of this thread.  COPY is more responsive to
interruptions, and is always going to have a much better performance 
than individual or multi-value INSERTs for the bulk-loading of data,
so we could just move on with what's proposed on the other thread and
solve the problem of this thread on top of improving the loading
performance.

What are the benefits we gain with the proposal of this thread once we
switch to COPY as method for the client-side data generation?  If
the argument is to be able switch to a different error code on
cancellations for pgbench, that sounds a bit thin to me versus the
argument of keeping the cancellation callback path as simple as
possible.
--
Michael


signature.asc
Description: PGP signature


Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-22 Thread Nathan Bossart
On Thu, Jun 22, 2023 at 08:43:01AM -0700, Nathan Bossart wrote:
> I plan to commit these patches later today.

Committed.  I've also marked the related open item for v16 as resolved.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Preventing non-superusers from altering session authorization

2023-06-22 Thread Joseph Koshakow
On Wed, Jun 21, 2023 at 11:48 PM Nathan Bossart 
wrote:
>
>On Wed, Jun 21, 2023 at 04:28:43PM -0400, Joseph Koshakow wrote:
>> + roleTup = SearchSysCache1(AUTHOID,
ObjectIdGetDatum(AuthenticatedUserId));
>> + if (!HeapTupleIsValid(roleTup))
>> + ereport(FATAL,
>> +
(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
>> + errmsg("role with OID
%u does not exist", AuthenticatedUserId)));
>> + rform = (Form_pg_authid) GETSTRUCT(roleTup);
>
>I think "superuser_arg(AuthenticatedUserId)" would work here.

Yep, that worked. I've attached a patch with this change.

> I see that RESET SESSION AUTHORIZATION
> with a concurrently dropped role will FATAL with your patch but succeed
> without it, which could be part of the reason.

That might be a good change? If the original authenticated role ID no
longer exists then we may want to return an error when trying to set
your session authorization to that role.

Thanks,
Joe Koshakow
From 2b2817e3ea4f1541a781216afb7415435ca362a0 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Thu, 15 Jun 2023 14:53:11 -0400
Subject: [PATCH] Prevent non-superusers from altering session auth

Previously, if a user connected with as a role that had the superuser
attribute, then they could always execute a SET SESSION AUTHORIZATION
statement for the duration of their session. Even if the role was
altered to set superuser to false, the user was still allowed to
execute SET SESSION AUTHORIZATION. This allowed them to set their
session role to some other superuser and effectively regain the
superuser privileges. They could even reset their own superuser
attribute to true.

This commit alters the privilege checks for SET SESSION AUTHORIZATION
such that a user can only execute it if the role they connected with is
currently a superuser. This prevents users from regaining superuser
privileges after it has been revoked.
---
 doc/src/sgml/ref/set_session_auth.sgml |  2 +-
 src/backend/utils/init/miscinit.c  | 21 +++--
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/ref/set_session_auth.sgml b/doc/src/sgml/ref/set_session_auth.sgml
index f8fcafc194..94adab2468 100644
--- a/doc/src/sgml/ref/set_session_auth.sgml
+++ b/doc/src/sgml/ref/set_session_auth.sgml
@@ -51,7 +51,7 @@ RESET SESSION AUTHORIZATION
 
   
The session user identifier can be changed only if the initial session
-   user (the authenticated user) had the
+   user (the authenticated user) has the
superuser privilege.  Otherwise, the command is accepted only if it
specifies the authenticated user name.
   
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..4cef655703 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -467,7 +467,7 @@ ChangeToDataDir(void)
  * AuthenticatedUserId is determined at connection start and never changes.
  *
  * SessionUserId is initially the same as AuthenticatedUserId, but can be
- * changed by SET SESSION AUTHORIZATION (if AuthenticatedUserIsSuperuser).
+ * changed by SET SESSION AUTHORIZATION.
  * This is the ID reported by the SESSION_USER SQL function.
  *
  * OuterUserId is the current user ID in effect at the "outer level" (outside
@@ -492,8 +492,6 @@ static Oid	OuterUserId = InvalidOid;
 static Oid	CurrentUserId = InvalidOid;
 static const char *SystemUser = NULL;
 
-/* We also have to remember the superuser state of some of these levels */
-static bool AuthenticatedUserIsSuperuser = false;
 static bool SessionUserIsSuperuser = false;
 
 static int	SecurityRestrictionContext = 0;
@@ -731,6 +729,7 @@ InitializeSessionUserId(const char *rolename, Oid roleid)
 	HeapTuple	roleTup;
 	Form_pg_authid rform;
 	char	   *rname;
+	bool	   is_superuser;
 
 	/*
 	 * Don't do scans if we're bootstrapping, none of the system catalogs
@@ -770,10 +769,10 @@ InitializeSessionUserId(const char *rolename, Oid roleid)
 	rname = NameStr(rform->rolname);
 
 	AuthenticatedUserId = roleid;
-	AuthenticatedUserIsSuperuser = rform->rolsuper;
+	is_superuser = rform->rolsuper;
 
 	/* This sets OuterUserId/CurrentUserId too */
-	SetSessionUserId(roleid, AuthenticatedUserIsSuperuser);
+	SetSessionUserId(roleid, is_superuser);
 
 	/* Also mark our PGPROC entry with the authenticated user id */
 	/* (We assume this is an atomic store so no lock is needed) */
@@ -806,7 +805,7 @@ InitializeSessionUserId(const char *rolename, Oid roleid)
 		 * just document that the connection limit is approximate.
 		 */
 		if (rform->rolconnlimit >= 0 &&
-			!AuthenticatedUserIsSuperuser &&
+			!is_superuser &&
 			CountUserBackends(roleid) > rform->rolconnlimit)
 			ereport(FATAL,
 	(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
@@ -818,7 +817,7 @@ InitializeSessionUserId(const char *rolename, Oid roleid)
 	SetConfigOption("session_authorization", rname,
 	PGC_BACKEND, PGC_S_OVERRIDE);
 	

Re: Do we want a hashset type?

2023-06-22 Thread Tomas Vondra
On 6/22/23 19:52, Joel Jacobson wrote:
> On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote:
>> This is also what the SQL standard does for multisets - there's SQL:20nn
>> draft at http://www.wiscorp.com/SQLStandards.html, and the > predicate> section (p. 475) explains how this should work with NULL.
> 
> I've looked again at the paper you mentioned and found something intriguing
> in section 2.6 (b). I'm a bit puzzled about this: why would we want to return
> null when we're certain it's not null but just doesn't have any elements?
> 
> In the same vein, it says, "If it has more than one element, an exception is
> raised." Makes sense to me, but what about when there are no elements at all?
> Why not raise an exception in that case too?
> 
> The ELEMENT function is designed to do one simple thing: return the element of
> a multiset if the multiset has only 1 element. This seems very similar to how
> our INTO STRICT operates, right?
> 

I agree this looks a bit weird, but that's what I mentioned - this is an
initial a proposal, outlining the idea. Inevitably some of the stuff
will get reworked or just left out of the final version. It's useful
mostly to explain the motivation / goal.

I believe that's the case here - I don't think the ELEMENT got into the
standard at all, and the NULL rules for the MEMBER OF clause seem not to
have these strange bits.

> The SQL:20nn seems to still be in draft form, and I can't help but wonder if 
> we
> should propose a bit of an improvement here:
> 
> "If it doesn't have exactly one element, an exception is raised."
> 
> Meaning, it would raise an exception both if there are more elements,
> or zero elements (no elements).
> 
> I think this would make the semantics more intuitive and less surprising.
> 

Well, the simple truth is the draft is freely available, but you'd need
to buy the final version. It doesn't mean it's still being worked on or
that no SQL standard was released since then. In fact, SQL 2023 was
released a couple weeks ago [1].

It'd be interesting to know the version that actually got into the SQL
standard (if at all), but I don't have access to the standard yet.

regards


[1] https://www.iso.org/standard/76584.html

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Remove deprecation warnings when compiling PG ~13 with OpenSSL 3.0~

2023-06-22 Thread Michael Paquier
On Thu, Jun 22, 2023 at 08:08:54PM +0200, Peter Eisentraut wrote:
> The message linked to above also says:
> 
>> I'm not sure.  I don't have a good sense of what OpenSSL versions we
>> claim to support in branches older than PG13.  We made a conscious
>> decision for 1.0.1 in PG13, but I seem to recall that that discussion
>> also revealed that the version assumptions before that were quite
>> inconsistent.  Code in PG12 and before makes references to OpenSSL as
>> old as 0.9.6.  But OpenSSL 3.0.0 will reject a compat level older than
>> 0.9.8.

Well, I highly doubt that anybody has tried to compile Postgres 12
with OpenSSL 0.9.7 for a few years.  If they attempt to do so, the
compilation fails:
: note: this is the location of the previous definition
In file included from ../../src/include/common/scram-common.h:16,
 from scram-common.c:23:
../../src/include/common/sha2.h:73:9: error: unknown type name ‘SHA256_CTX’
   73 | typedef SHA256_CTX pg_sha256_ctx;

One reason is that SHA256_CTX is defined in OpenSSL 0.9.8
crypto/sha/sha.h, but this exists only in fips-1.0 in OpenSSL 0.9.7,
while we rely on SHA256_CTX in src/common/ since SCRAM exists.

Also, note that the documentation claims that the minimum version of
OpenSSL supported is 0.9.8, which is something that commit 9b7cd59 has
done, impacting Postgres 10~.  So your argument looks incorrect to me?

Honestly, I see no reason to not move on with this and remove these
deprecation warnings as proposed by the last patches sent.  (I have
run builds with 0.9.8, FWIW.)
--
Michael


signature.asc
Description: PGP signature


Re: pg_collation.collversion for C.UTF-8

2023-06-22 Thread Thomas Munro
On Tue, Jun 20, 2023 at 6:48 AM Jeff Davis  wrote:
> On Sat, 2023-06-17 at 17:54 +1200, Thomas Munro wrote:
> > > Would it be correct to interpret LC_COLLATE=C.UTF-8 as
> > > LC_COLLATE=C,
> > > but leave LC_CTYPE=C.UTF-8 as-is?
> >
> > Yes.  The basic idea, at least for these two OSes, is that every
> > category behaves as if set to C, except LC_CTYPE.
>
> If that's true, and we version C.UTF-8, then users could still get the
> behavior they want, a stable collation order, and benefit from the
> optimized code path by setting LC_COLLATE=C and LC_CTYPE=C.UTF-8.
>
> The only caveat is to be careful with things that depend on ctype in
> indexes and constraints. While still a problem, it's a smaller problem
> than unversioned collation. We should think a little more about solving
> it, because I think there's a strong case to be made that a default
> collation of C and a database ctype of something else is a good
> combination (it makes less sense for a case-insensitive collation, but
> those aren't allowed as a default collation).
>
> In any case, we're better off following the rule "version anything that
> goes to any external provider, period". And by "version", I really mean
> a best effort, because we don't always have great information, but I
> think it's better to record what we do have than not. We have just seen
> too many examples of weird behavior. On top of that, it's simply
> inconsistent to assume that C=C.UTF-8 for collation version, but not
> for the collation implementation.

Yeah, OK, you're convincing me.  It's hard to decide because our model
is basically wrong so it's only warning you about potential ctype
changes by happy coincidence, but even in respect of sort order it was
probably a mistake to start second-guessing what libc is doing, and
with that observation about the C/C.UTF-8 combination, at least an
end-user has a way to opt in/out of this choice.  I'll try to write a
concise commit message for Daniel's patch explaining all this and we
can see about squeaking it into beta2.

> Use rs might get frustrated that the collation for C.UTF-8 is versioned,
> of course. But I don't think it will affect anyone for quite some time,
> because existing users will have a datcollversion=NULL; so they won't
> get the warnings until they refresh the versions (or create new
> collations/databases), and then after that upgrade libc. Right? So they
> should have time to adjust to use LC_COLLATE=C if that's what they
> want.

Yeah.

> An alternative would be to define lc_collate_is_c("C.UTF-8") == true
> while leaving lc_ctype_is_c("C.UTF-8") == false and
> get_collation_actual_version("C.UTF-8") == NULL. In that case we would
> not be passing it to an external provider, so we don't have to version
> it. But that might be a little too magical and I'm not inclined to do
> that.

Agreed, let's not do any more of that sort of thing.

> Another alternative would be to implement C.UTF-8 internally according
> to the "true" semantics, if they are truly simple and well-defined and
> stable. But I don't think ctype=C.UTF-8 is actually stable because new
> characters can be added, right?

Correct.




Re: Bytea PL/Perl transform

2023-06-22 Thread Greg Sabino Mullane
>
> So I decided to propose a simple transform extension to pass bytea as
> native Perl octet strings.


Quick review, mostly housekeeping things:

* Needs a rebase, minor failure on Mkvcbuild.pm
* Code needs standardized formatting, esp. bytea_plperl.c
* Needs to be meson-i-fied (i.e. add a "meson.build" file)
* Do all of these transforms need to be their own contrib modules? So much
duplicated code across contrib/*_plperl already (and *plpython too for that
matter) ...

Cheers,
Greg


Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH

2023-06-22 Thread David Zhang
After conducting a further investigation into this issue, I have made 
some discoveries. The previous patch successfully resolves the problem 
when running the commands `./configure && make && make check` (without 
any previous sudo make install or make install). However, it stops at 
the 'isolation check' when using the commands `./configure 
--enable-tap-tests && make && make check-world`.


To address this, I attempted to apply a similar approach as the previous 
patch, resulting in an experimental patch (attached). This new patch 
helps progress the 'make-world' process and passes the 'isolation 
check', but there are still several remaining issues that need to be 
addressed.



Currently, there is a description suggesting a workaround by running a 
'make install' command first, but I find it to be somewhat inaccurate. 
It would be better to update the existing description to provide more 
precise instructions on how to overcome this issue. Here are the changes 
I would suggest.


from:
"You can work around that by doing make install before make check. Most 
PostgreSQL developers just turn off SIP, though."


to:
"You can execute sudo make install if you do not specify a prefix during 
the configure step, or make install without sudo if you do specify a 
prefix (assuming proper permissions) before make check. Most PostgreSQL 
developers just turn off SIP, though."


Otherwise, following the current description, if you run `./configure  
&& make install` you will get error: "mkdir: /usr/local/pgsql: 
Permission denied"



Below are the steps I took that led to the discovery of additional issues.

git apply  pg_regress_mac_os_x_dyld.patch
./configure
make
make check

... ...
# All 215 tests passed.


./configure --enable-tap-tests
make
make check-world

... ...
echo "# +++ isolation check in src/test/isolation +++"
... ...
dyld[32335]: Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
  Referenced from:  
/Users/david/hg/sandbox/postgres/src/test/isolation/isolationtester
  Reason: tried: '/usr/local/pgsql/lib/libpq.5.dylib' (no such file), 
'/System/Volumes/Preboot/Cryptexes/OS/usr/local/pgsql/lib/libpq.5.dylib' 
(no such file), '/usr/local/pgsql/lib/libpq.5.dylib' (no such file), 
'/usr/local/lib/libpq.5.dylib' (no such file), '/usr/lib/libpq.5.dylib' 
(no such file, not in dyld cache)
no data was returned by command 
""/Users/david/hg/sandbox/postgres/src/test/isolation/isolationtester" -V"




git apply pg_regress_mac_os_x_dyld_isolation_check_only.patch
./configure --enable-tap-tests
make
make check-world

... ...
# All 215 tests passed.
... ...
# +++ isolation check in src/test/isolation +++
... ...
# All 112 tests passed.

echo "# +++ tap check in src/test/modules/brin +++"
... ...
# +++ tap check in src/test/modules/brin +++
t/01_workitems.pl  Bailout called.  Further testing stopped:  
command "initdb -D 
/Users/david/hg/sandbox/postgres/src/test/modules/brin/tmp_check/t_01_workitems_tango_data/pgdata 
-A trust -N" died with signal 6

t/01_workitems.pl  Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run


Any thoughts ?

Thank you

David

On 2023-06-16 2:25 p.m., David Zhang wrote:

I have applied the patch to the latest master branch and successfully executed './configure && 
make && make check' on macOS Ventura. However, during the process, a warning was encountered: 
"mixing declarations and code is incompatible with standards before C99 
[-Wdeclaration-after-statement]". Moving the declaration of 'result' to the beginning like below can 
resolve the warning, and it would be better to use a unique variable instead of 'result'.

#ifdef __darwin__
static char extra_envvars[4096];
+int result = -1;
... ...
-int result = snprintf(extra_envvars, sizeof(extra_envvars), 
"DYLD_LIBRARY_PATH=%s",
+result = snprintf(extra_envvars, sizeof(extra_envvars), "DYLD_LIBRARY_PATH=%s",diff --git a/src/common/exec.c b/src/common/exec.c
index f209b934df..8cf2c21a66 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -74,6 +74,12 @@ static char *pg_realpath(const char *fname);
 static BOOL GetTokenUser(HANDLE hToken, PTOKEN_USER *ppTokenUser);
 #endif
 
+#ifdef __darwin__
+static char extra_envvars[4096];
+#else
+static const char extra_envvars[] = "";
+#endif 
+
 /*
  * validate_exec -- validate "path" as an executable file
  *
@@ -330,6 +336,15 @@ find_other_exec(const char *argv0, const char *target,
charcmd[MAXPGPATH];
charline[MAXPGPATH];
 
+#ifdef __darwin__
+int result = 1;
+result = snprintf(extra_envvars, sizeof(extra_envvars), 
"DYLD_LIBRARY_PATH=%s",
+  getenv("DYLD_LIBRARY_PATH"));
+if (result < 0 || result >= sizeof(extra_envvars))
+{   
+printf("extra_envars too small for DYLD_LIBRARY_PATH");
+}
+#endif
if (find_my_exec(argv0, retpath) < 0)
return -1;
 
@@ -344,7 +359,7 @@ 

Re: OK to build LLVM (*.bc) with CLANG but rest of postgresql with CC (other compiler)?

2023-06-22 Thread Andres Freund
Hi,

On 2023-06-13 11:20:52 +0200, Palle Girgensohn wrote:
> CLANG is used to compile *.bc files during postgresql build. Is it OK to
> have a different compiler for the rest of the build? gcc, or even another
> version of clang?

Yes.


> LLVM is an optional add-on, a package. The default version is 15, and it also 
> installs the clang15 binary for the corresponding clang version 15.
> 
> As I understand, you're "better off" compiling the LLVM stuff in PostgreSQL 
> with the same version clang compiler as the LLVM version you're using. Hence, 
> with LLVM 15, set the environment variable CLANG=/path/to/clang15 when 
> running configure. If the .bc files will get compiled by the base system 
> clang compiler, this can lead to a ThinLTO link error, if the base system 
> compiler is a newer versione of llvm.

Yea, the compatibility matrix for llvm bitcode is complicated.


> The question is if it is a bad idea to use the base compiler, say clang13,
> to build postgresql, but set CLANG=clang15 to match the LLVM version. Am I
> better off using clang15 for everything then?

That should be entirely fine. If you already have the newer clang version, it
might also make sense to just use it from a simplicity perspective
(e.g. compiler warnings being the same etc), but it's not required. It works
just fine to compile the postgres binary with gcc and use clang for the
bitcode after all.

Greetings,

Andres Freund




Re: UUID v7

2023-06-22 Thread Nikolay Samokhvalov
On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis)  wrote:
> I am happy to see others interested in the improvements provided by UUIDv7!

Thank you for providing the details!

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986




Re: Remove deprecation warnings when compiling PG ~13 with OpenSSL 3.0~

2023-06-22 Thread Peter Eisentraut

On 22.06.23 01:53, Michael Paquier wrote:

Looking at the relevant thread from 2020, this was still at the point
where we did not consider supporting 3.0 for all the stable branches
because 3.0 was in alpha:
https://www.postgresql.org/message-id/3d4afcfc-0930-1389-b9f7-59bdf11fb...@2ndquadrant.com

However, recent fixes like cab553a have made that possible, and we do
build with OpenSSL 3.0 across the whole set of stable branches.
Regarding the versions of OpenSSL supported:
- REL_13_STABLE requires 1.0.1 since 7b283d0e1.
- REL_12_STABLE and REL_11_STABLE require 0.9.8.

For 0.9.8, OPENSSL_API_COMPAT needs to be set at 0x00908000L (see
upstream's CHANGES.md).  So I don't see a reason not to do as
suggested by Andres?


The message linked to above also says:

> I'm not sure.  I don't have a good sense of what OpenSSL versions we
> claim to support in branches older than PG13.  We made a conscious
> decision for 1.0.1 in PG13, but I seem to recall that that discussion
> also revealed that the version assumptions before that were quite
> inconsistent.  Code in PG12 and before makes references to OpenSSL as
> old as 0.9.6.  But OpenSSL 3.0.0 will reject a compat level older than
> 0.9.8.





Re: Do we want a hashset type?

2023-06-22 Thread Joel Jacobson
On Tue, Jun 20, 2023, at 14:10, Tomas Vondra wrote:
> This is also what the SQL standard does for multisets - there's SQL:20nn
> draft at http://www.wiscorp.com/SQLStandards.html, and the  predicate> section (p. 475) explains how this should work with NULL.

I've looked again at the paper you mentioned and found something intriguing
in section 2.6 (b). I'm a bit puzzled about this: why would we want to return
null when we're certain it's not null but just doesn't have any elements?

In the same vein, it says, "If it has more than one element, an exception is
raised." Makes sense to me, but what about when there are no elements at all?
Why not raise an exception in that case too?

The ELEMENT function is designed to do one simple thing: return the element of
a multiset if the multiset has only 1 element. This seems very similar to how
our INTO STRICT operates, right?

The SQL:20nn seems to still be in draft form, and I can't help but wonder if we
should propose a bit of an improvement here:

"If it doesn't have exactly one element, an exception is raised."

Meaning, it would raise an exception both if there are more elements,
or zero elements (no elements).

I think this would make the semantics more intuitive and less surprising.

/Joel




Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-06-22 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> (Don't we have existing precedents that apply here?  I can't offhand
>> think of any existing ALTER commands that would reject no-op requests,
>> but maybe that's not a direct precedent.)

> Since it only supports adding these operations if they don't already
> exist, should it not be ALTER OPERATOR ADD , not SET ?
> That makes it natural to add an IF NOT EXISTS clause, like ALTER TABLE
> ADD COLUMN has, to make it a no-op instead of an error.

Hmm, maybe.  But it feels like choosing syntax and semantics based
on what might be only a temporary implementation restriction.  We
certainly don't handle any other property-setting commands that way.

Admittedly, "can't change an existing setting" is likely to be pretty
permanent in this case, just because I don't see a use-case for it
that'd justify the work involved.  (My wife recently gave me a coffee
cup that says "Nothing is as permanent as a temporary fix.")  But
still, if someone did show up and do that work, we'd regret this
choice of syntax because it'd then be uselessly unlike every other
ALTER command.

regards, tom lane




Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-06-22 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Tommy Pavlicek  writes:
>
>> Additionally, I wasn't sure whether it was preferred to fail or succeed on
>> ALTERs that have no effect, such as adding hashes on an operator that
>> already allows them or disabling hashes on one that does not. I chose to
>> raise an error when this happens, on the thinking it was more explicit and
>> made the code simpler, even though the end result would be what the user
>> wanted.
>
> You could argue that both ways I guess.  We definitely need to raise error
> if the command tries to change an existing nondefault setting, since that
> might break things as per previous discussion.  But perhaps rejecting
> an attempt to set the existing setting is overly nanny-ish.  Personally
> I think I'd lean to "don't throw an error if we don't have to", but I'm
> not strongly set on that position.
>
> (Don't we have existing precedents that apply here?  I can't offhand
> think of any existing ALTER commands that would reject no-op requests,
> but maybe that's not a direct precedent.)

Since it only supports adding these operations if they don't already
exist, should it not be ALTER OPERATOR ADD , not SET ?

That makes it natural to add an IF NOT EXISTS clause, like ALTER TABLE
ADD COLUMN has, to make it a no-op instead of an error.

>   regards, tom lane

- ilmari




Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-06-22 Thread Tom Lane
Tommy Pavlicek  writes:
> I've attached a couple of patches to allow ALTER OPERATOR to add
> commutators, negators, hashes and merges to operators that lack them.

Please add this to the upcoming commitfest [1], to ensure we don't
lose track of it.

> The first patch is create_op_fixes_v1.patch and it includes some
> refactoring in preparation for the ALTER OPERATOR changes and fixes a
> couple of minor bugs in CREATE OPERATOR:
> - prevents self negation when filling in/creating an existing shell operator
> - remove reference to sort operator in the self negation error message as
> the sort attribute appears to be deprecated in Postgres 8.3

Hmm, yeah, I bet nobody has looked at those edge cases in awhile.

> Additionally, I wasn't sure whether it was preferred to fail or succeed on
> ALTERs that have no effect, such as adding hashes on an operator that
> already allows them or disabling hashes on one that does not. I chose to
> raise an error when this happens, on the thinking it was more explicit and
> made the code simpler, even though the end result would be what the user
> wanted.

You could argue that both ways I guess.  We definitely need to raise error
if the command tries to change an existing nondefault setting, since that
might break things as per previous discussion.  But perhaps rejecting
an attempt to set the existing setting is overly nanny-ish.  Personally
I think I'd lean to "don't throw an error if we don't have to", but I'm
not strongly set on that position.

(Don't we have existing precedents that apply here?  I can't offhand
think of any existing ALTER commands that would reject no-op requests,
but maybe that's not a direct precedent.)

regards, tom lane

[1] https://commitfest.postgresql.org/43/




Re: vac_truncate_clog()'s bogus check leads to bogusness

2023-06-22 Thread Andres Freund
Hi,

On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> On Wed, Jun 21, 2023 at 03:12:08PM -0700, Andres Freund wrote:
> > When vac_truncate_clog() returns early
> ...
> > we haven't released the lwlock that we acquired earlier
> 
> > Until there's some cause for the session to call LWLockReleaseAll(), the 
> > lock
> > is held. Until then neither the process holding the lock, nor any other
> > process, can finish vacuuming.  We don't even have an assert against a
> > self-deadlock with an already held lock, oddly enough.
> 
> I agree with this finding.  Would you like to add the lwlock releases, or
> would you like me to?

Happy with either.  I do have code and testcase, so I guess it would make
sense for me to do it?


> The bug has been in all released versions for 2.5 years, yet it escaped
> notice.  That tells us something.  Bogus values have gotten rare?  The
> affected session tends to get lucky and call LWLockReleaseAll() soon?

I am not sure either. I suspect that part of it is that people couldn't even
pinpoint the problem when it happened.  Process exit calls LWLockReleaseAll(),
which I assume would avoid the problem in many cases.

Greetings,

Andres Freund




Re: Assert while autovacuum was executing

2023-06-22 Thread Andres Freund
Hi,

On 2023-06-22 10:00:01 +0530, Amit Kapila wrote:
> On Wed, Jun 21, 2023 at 11:53 AM Peter Geoghegan  wrote:
> >
> > On Tue, Jun 20, 2023 at 10:27 PM Andres Freund  wrote:
> > > As far as I can tell 72e78d831a as-is is just bogus. Unfortunately that 
> > > likely
> > > also means 3ba59ccc89 is not right.
> >
> > Quite possibly. But I maintain that ginInsertCleanup() is probably
> > also bogus in a way that's directly relevant.
> >
> > Did you know that ginInsertCleanup() is the only code that uses
> > heavyweight page locks these days? Though only on the index metapage!
> >
> > Isn't this the kind of thing that VACUUM's relation level lock is
> > supposed to take care of?
> >
> 
> Yeah, I also can't see why that shouldn't be sufficient for VACUUM.

I'd replied on that point to Peter earlier, accidentlly loosing the CC
list. The issue is that ginInsertCleanup() isn't just called from VACUUM, but
also from normal inserts (to reduce the size of the fastupdate list).

You can possibly come up with another scheme, but I think just doing this via
the relation lock might be problematic. Suddenly an insert would, temporarily,
also block operations that don't normally conflict with inserts etc.

Greetings,

Andres Freund




[PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-06-22 Thread Tommy Pavlicek
Hi All,

I've attached a couple of patches to allow ALTER OPERATOR to add
commutators, negators, hashes and merges to operators that lack them.

The need for this arose adding hash functions to the ltree type after the
operator had been created without hash support[1]. There are potential
issues with modifying these attributes that have been discussed
previously[2], but I understand that setting them, if they have not been
set before, is ok.

I belatedly realised that it may not be desirable or necessary to allow
adding commutators and negators in ALTER OPERATOR because the linkage can
already be added when creating a new operator. I don't know what's best, so
I thought I'd post this here and get feedback before removing anything.

The first patch is create_op_fixes_v1.patch and it includes some
refactoring in preparation for the ALTER OPERATOR changes and fixes a
couple of minor bugs in CREATE OPERATOR:
- prevents self negation when filling in/creating an existing shell operator
- remove reference to sort operator in the self negation error message as
the sort attribute appears to be deprecated in Postgres 8.3

The second patch is alter_op_v1.patch which contains the changes to ALTER
OPERATOR and depends on create_op_fixes_v1.patch.

Additionally, I wasn't sure whether it was preferred to fail or succeed on
ALTERs that have no effect, such as adding hashes on an operator that
already allows them or disabling hashes on one that does not. I chose to
raise an error when this happens, on the thinking it was more explicit and
made the code simpler, even though the end result would be what the user
wanted.

Comments appreciated.

Thanks,
Tommy

[1]
https://www.postgresql.org/message-id/flat/CAEhP-W9ZEoHeaP_nKnPCVd_o1c3BAUvq1gWHrq8EbkNRiS9CvQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO@dinodell


alter_op_v1.patch
Description: Binary data


create_op_fixes_v1.patch
Description: Binary data


Re: New function to show index being vacuumed

2023-06-22 Thread Imseih (AWS), Sami
> I'm sorry for not having read (and not reading) the other thread yet,
> but what was the reason we couldn't store that oid in a column in the
> pg_s_p_vacuum-view?


> Could you summarize the other solutions that were considered for this issue?

Thanks for your feedback!

The reason we cannot stick the oid in pg_s_p_vacuum is because it will
not work for parallel vacuum as only the leader process has an entry
in pg_s_p_vacuum.

With a function the leader or worker pid can be passed in to the function
and will return the indexrelid being processed.

Regards,

Sami








Re: New function to show index being vacuumed

2023-06-22 Thread Matthias van de Meent
On Thu, 22 Jun 2023 at 16:45, Imseih (AWS), Sami  wrote:
>
> Hi,
>
> [1] is a ready-for-committer enhancement to pg_stat_progress_vacuum which 
> exposes
> the total number of indexes to vacuum and how many indexes have been vacuumed 
> in
> the current vacuum cycle.
>
> To even further improve visibility into index vacuuming, it would be 
> beneficial to have a
> function called pg_stat_get_vacuum_index(pid) that takes in a pid and returns 
> the
> indexrelid of the index being processed.

I'm sorry for not having read (and not reading) the other thread yet,
but what was the reason we couldn't store that oid in a column in the
pg_s_p_vacuum-view?

Could you summarize the other solutions that were considered for this issue?

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-22 Thread Nathan Bossart
On Thu, Jun 22, 2023 at 04:11:08PM +0900, Michael Paquier wrote:
> Sounds good to me.  Thanks.

I plan to commit these patches later today.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




New function to show index being vacuumed

2023-06-22 Thread Imseih (AWS), Sami
Hi,

[1] is a ready-for-committer enhancement to pg_stat_progress_vacuum which 
exposes
the total number of indexes to vacuum and how many indexes have been vacuumed in
the current vacuum cycle.

To even further improve visibility into index vacuuming, it would be beneficial 
to have a
function called pg_stat_get_vacuum_index(pid) that takes in a pid and returns 
the
indexrelid of the index being processed.

Currently the only way to get the index being vacuumed by a process
Is through os tools such as pstack.

I had a patch for this as part of [1], but it was decided to handle this in a 
separate
discussion.

Comments/feedback will be appreciated before sending out a v1 of the patch.


Regards,

Sami Imseih
Amazon Web Services (AWS)

1. 
https://www.postgresql.org/message-id/flat/5478dfcd-2333-401a-b2f0-0d186ab09...@amazon.com




Re: memory leak in trigger handling (since PG12)

2023-06-22 Thread Tomas Vondra
On 6/22/23 13:46, Tomas Vondra wrote:
> ...
> 
> I haven't tried the reproducer, but I think I see the issue - we store
> the bitmap as part of the event to be executed later, but the bitmap is
> in per-tuple context and gets reset. So I guess we need to copy it into
> the proper long-lived context (e.g. AfterTriggerEvents).
> 
> I'll get that fixed.
> 

Alexander, can you try if this fixes the issue for you?


regard

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4b295f8da5..0073f576f0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3976,6 +3976,36 @@ afterTriggerCheckState(AfterTriggerShared evtshared)
 	return ((evtshared->ats_event & AFTER_TRIGGER_INITDEFERRED) != 0);
 }
 
+/* --
+ * afterTriggerCopyBitmap()
+ *
+ *	Copy bitmap into AfterTriggerEvents memory context.
+ * --
+ */
+static Bitmapset *
+afterTriggerCopyBitmap(Bitmapset *src)
+{
+	Bitmapset	   *dst;
+	MemoryContext	oldcxt;
+
+	if (src == NULL)
+		return NULL;
+
+	/* Create event context if we didn't already */
+	if (afterTriggers.event_cxt == NULL)
+		afterTriggers.event_cxt =
+			AllocSetContextCreate(TopTransactionContext,
+  "AfterTriggerEvents",
+  ALLOCSET_DEFAULT_SIZES);
+
+	oldcxt = MemoryContextSwitchTo(afterTriggers.event_cxt);
+
+	dst = bms_copy(src);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return dst;
+}
 
 /* --
  * afterTriggerAddEvent()
@@ -6387,7 +6417,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			new_shared.ats_table = transition_capture->tcs_private;
 		else
 			new_shared.ats_table = NULL;
-		new_shared.ats_modifiedcols = modifiedCols;
+		new_shared.ats_modifiedcols = afterTriggerCopyBitmap(modifiedCols);
 
 		afterTriggerAddEvent(_stack[afterTriggers.query_depth].events,
 			 _event, _shared);


Re: Add GUC to tune glibc's malloc implementation.

2023-06-22 Thread Tom Lane
Ronan Dunklau  writes:
> Le jeudi 22 juin 2023, 15:49:36 CEST Tom Lane a écrit :
>> Aren't these same settings controllable via environment variables?
>> I could see adding some docs suggesting that you set thus-and-such
>> values in the postmaster's startup script.  Admittedly, the confusion
>> argument is perhaps still raisable; but we have a similar docs section
>> discussing controlling Linux OOM behavior, and I've not heard much
>> complaints about that.

> Yes they are, but controlling them via an environment variable for the whole 
> cluster defeats the point: different backends have different workloads, and 
> being able to make sure for example the OLAP user is memory-greedy while the 
> OLTP one is as conservative as possible is a worthwile goal.

And what is going to happen when we switch to a thread model?
(I don't personally think that's going to happen, but some other
people do.)  If we only document how to adjust this cluster-wide,
then we won't have a problem with that.  But I'm not excited about
introducing functionality that is both platform-dependent and
unsupportable in a threaded system.

regards, tom lane




Re: bgwriter doesn't flush WAL stats

2023-06-22 Thread Melanie Plageman
On Wed, Jun 21, 2023 at 9:49 PM Kyotaro Horiguchi
 wrote:
> Regarding the second patch, it introduces WAL IO time as a
> IOCONTEXT_NORMAL/IOOBJECT_WAL, but it doesn't seem to follow the
> convention or design of the pgstat_io component, which primarily
> focuses on shared buffer IOs.

I haven't reviewed the patch yet, but in my opinion having an
IOOBJECT_WAL makes sense. I imagined that we would add WAL as an
IOObject along with others such as an IOOBJECT_BYPASS for "bypass" IO
(IO done through the smgr API directly) and an  IOOBJECT_SPILL or
something like it for spill files from joins/aggregates/etc.

> > I do have a question about this.
> > So, if we were to start tracking WAL IO would it fit within this
> > paradigm to have a new IOPATH_WAL for WAL or would it add a separate
> > dimension?

Personally, I think WAL fits well as an IOObject. Then we can add
IOCONTEXT_INIT and use that for WAL file initialization and
IOCONTEXT_NORMAL for normal WAL writes/fysncs/etc. I don't think we
need a new dimension for it as it feels like an IO target just like
shared buffers and temporary buffers do. I think we should save adding
new dimensions for relationships that we can't express in the existing
paradigm.

- Melanie




Re: Add GUC to tune glibc's malloc implementation.

2023-06-22 Thread Ronan Dunklau
Le jeudi 22 juin 2023, 15:49:36 CEST Tom Lane a écrit :
> This seems like a pretty awful idea, mainly because there's no way
> to have such a GUC mean anything on non-glibc platforms, which is
> going to cause confusion or worse.

I named the GUC glibc_malloc_max_trim_threshold, I hope this is enough to 
clear up the confusion. We already have at least event_source, which is 
windows specific even if it's not clear from the name. 

> 
> Aren't these same settings controllable via environment variables?
> I could see adding some docs suggesting that you set thus-and-such
> values in the postmaster's startup script.  Admittedly, the confusion
> argument is perhaps still raisable; but we have a similar docs section
> discussing controlling Linux OOM behavior, and I've not heard much
> complaints about that.

Yes they are, but controlling them via an environment variable for the whole 
cluster defeats the point: different backends have different workloads, and 
being able to make sure for example the OLAP user is memory-greedy while the 
OLTP one is as conservative as possible is a worthwile goal.  Or even a 
specific backend may want to raise it's work_mem and adapt glibc behaviour 
accordingly, then get back to being conservative with memory until the next 
such transaction. 

Regards,

--
Ronan Dunklau






Re: Add GUC to tune glibc's malloc implementation.

2023-06-22 Thread Tom Lane
Ronan Dunklau  writes:
> Following some conversation with Tomas at PGCon, I decided to resurrect this 
> topic, which was previously discussed in the context of moving tuplesort to 
> use GenerationContext: https://www.postgresql.org/message-id/
> 8046109.NyiUUSuA9g%40aivenronan

This seems like a pretty awful idea, mainly because there's no way
to have such a GUC mean anything on non-glibc platforms, which is
going to cause confusion or worse.

Aren't these same settings controllable via environment variables?
I could see adding some docs suggesting that you set thus-and-such
values in the postmaster's startup script.  Admittedly, the confusion
argument is perhaps still raisable; but we have a similar docs section
discussing controlling Linux OOM behavior, and I've not heard much
complaints about that.

regards, tom lane




Add GUC to tune glibc's malloc implementation.

2023-06-22 Thread Ronan Dunklau
Hello, 

Following some conversation with Tomas at PGCon, I decided to resurrect this 
topic, which was previously discussed in the context of moving tuplesort to 
use GenerationContext: https://www.postgresql.org/message-id/
8046109.NyiUUSuA9g%40aivenronan

The idea for this patch is that the behaviour of glibc's malloc can be 
counterproductive for us in some cases. To summarise, glibc's malloc offers 
(among others) two tunable parameters which greatly affects how it allocates 
memory. From the mallopt manpage:

   M_TRIM_THRESHOLD
  When the amount of contiguous free memory at the top of
  the heap grows sufficiently large, free(3) employs sbrk(2)
  to release this memory back to the system.  (This can be
  useful in programs that continue to execute for a long
  period after freeing a significant amount of memory.)  

   M_MMAP_THRESHOLD
  For allocations greater than or equal to the limit
  specified (in bytes) by M_MMAP_THRESHOLD that can't be
  satisfied from the free list, the memory-allocation
  functions employ mmap(2) instead of increasing the program
  break using sbrk(2).

The thing is, by default, those parameters are adjusted dynamically by the 
glibc itself. It starts with quite small thresholds, and raises them when the 
program frees some memory, up to a certain limit. This patch proposes a new 
GUC allowing the user to adjust those settings according to their workload.

This can cause problems. Let's take for example a table with 10k rows, and 32 
columns (as defined by a bench script David Rowley shared last year when 
discussing the GenerationContext for tuplesort), and execute the following 
query, with 32MB of work_mem:

select * from t order by a offset 10;

On unpatched master, attaching strace to the backend and grepping on brk|mmap, 
we get the following syscalls:

brk(0x55b00df0c000) = 0x55b00df0c000
brk(0x55b00df05000) = 0x55b00df05000
brk(0x55b00df28000) = 0x55b00df28000
brk(0x55b00df52000) = 0x55b00df52000
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fbc49254000
brk(0x55b00df7e000) = 0x55b00df7e000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fbc48f7f000
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fbc48e7e000
mmap(NULL, 200704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fbc4980f000
brk(0x55b00df72000) = 0x55b00df72000
mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fbc3d56d000

Using systemtap, we can hook to glibc's mallocs static probes to log whenever 
it adjusts its values. During the above queries, glibc's malloc raised its 
thresholds:

347704: New thresholds: mmap: 2101248 bytes, trim: 4202496 bytes


If we re-run the query again, we get: 

brk(0x55b00dfe2000) = 0x55b00dfe2000
brk(0x55b00e042000) = 0x55b00e042000
brk(0x55b00e0ce000) = 0x55b00e0ce000
brk(0x55b00e1e6000) = 0x55b00e1e6000
brk(0x55b00e216000) = 0x55b00e216000
brk(0x55b00e416000) = 0x55b00e416000
brk(0x55b00e476000) = 0x55b00e476000
brk(0x55b00dfbc000) = 0x55b00dfbc000

This time, our allocations are below the new mmap_threshold, so malloc gets us 
our memory by repeatedly moving the brk pointer. 

When running with the attached patch, and setting the new GUC:

set glibc_malloc_max_trim_threshold = '64MB';

We now get the following syscalls for the same query, for the first run:

brk(0x55b00df0c000) = 0x55b00df0c000
brk(0x55b00df2e000) = 0x55b00df2e000
brk(0x55b00df52000) = 0x55b00df52000
brk(0x55b00dfb2000) = 0x55b00dfb2000
brk(0x55b00e03e000) = 0x55b00e03e000
brk(0x55b00e156000) = 0x55b00e156000
brk(0x55b00e186000) = 0x55b00e186000
brk(0x55b00e386000) = 0x55b00e386000
brk(0x55b00e3e6000) = 0x55b00e3e6000

But for the second run, the memory allocated is kept by malloc's freelist 
instead of being released to the kernel,  generating no syscalls at all, which 
brings us a significant performance improvement at the cost of more memory 
being used by the idle backend, up to twice as more tps.

On the other hand, the default behaviour can also be a problem if a backend 
makes big allocations for a short time and then never needs that amount of 
memory again.

For example, running this query: 

select * from generate_series(1, 100);

We allocate some memory. The first time it's run, malloc will use mmap to 
satisfy it. Once it's freed, it will raise it's threshold, 

Re: Making empty Bitmapsets always be NULL

2023-06-22 Thread Ranier Vilela
Em qui., 22 de jun. de 2023 às 01:43, David Rowley 
escreveu:

> On Thu, 22 Jun 2023 at 00:16, Ranier Vilela  wrote:
> > 2. Only compute BITNUM when necessary.
>
> I doubt this will help.  The % 64 done by BITNUM will be transformed
> to an AND operation by the compiler which is likely going to be single
> instruction latency on most CPUs which probably amounts to it being
> "free".  There's maybe a bit of reading for you in [1] and [2] if
> you're wondering how any operation could be free.
>
I think the word free is not the right one.
The end result of the code is the same, so whatever you write it one way or
the other,
the compiler will transform it as if it were written without calculating
BITNUM in advance.

See at:
https://godbolt.org/z/39MdcP7M3

The issue is the code becomes clearer and more readable with the
calculation in advance.
In that case, I think so.
But this is on a case-by-case basis, in other contexts it can be more
expensive.


>
> (The compiler is able to transform the % into what is effectively &
> because 64 is a power of 2.  uintvar % 64 is the same as uintvar & 63.
> Play around with [3] to see what I mean)
>
> > 3. Avoid enlargement when nwords is equal wordnum.
> > Can save cycles when in corner cases?
>
> No, you're just introducing a bug here.  Arrays in C are zero-based,
> so "wordnum >=  a->nwords" is exactly the correct way to check if
> wordnum falls outside the bounds of the existing allocated memory. By
> changing that to "wordnum > a->nwords" we'll fail to enlarge the words
> array when it needs to be enlarged by 1 element.
>
Yeah, this is my fault.
Unfortunately, I missed the failure of the regression tests.


> It looks like you've introduced a bunch of random white space and
> changed around a load of other random things in the patch too. I'm not
> sure why you think that's a good idea.
>
Weel,  It is much easier to read and follows the general style of the other
fonts.


> FWIW, we normally only write "if (somevar)" as a shortcut when somevar
> is boolean and we want to know that it's true.   The word value is not
> a boolean type, so although "if (someint)" and "if (someint != 0)"
> will compile to the same machine code, we don't normally write our C
> code that way in PostgreSQL.  We also tend to write "if (someptr !=
> NULL)" rather than "if (someptr)". The compiler will produce the same
> code for each, but we write the former to assist people reading the
> code so they know we're checking for NULL rather than checking if some
> boolean variable is true.
>
No, this is not the case.
With unsigned words, it can be a more appropriate test without == 0.

See:
https://stackoverflow.com/questions/14267081/difference-between-je-jne-and-jz-jnz

In some contexts, it can be faster when it has CMP instruction before.


> Overall, I'm not really interested in sneaking any additional changes
> that are unrelated to adjusting Bitmapsets so that don't carry
> trailing zero words. If have other optimisations you think are
> worthwhile, please include them in another thread along with
> benchmarks to show the performance increase.  For learning, I'd
> encourage you to do some micro benchmarks outside of PostgreSQL and
> mock up some Bitmapset code in a single .c file and try out with any
> without your changes after calling the function in a tight loop to see
> if you can measure any performance gains. Just remember you'll never
> see any gains in performance when your change compiles into the exact
> same code as without your change.  Between [1] and [2], you still
> might not see performance changes even when the compiled code is
> changed (I'm thinking of your #2 change here).
>
Well, *const* always is a good style and can prevent mistakes and
allows the compiler to do optimizations.

regards,
Ranier Vilela


Re: memory leak in trigger handling (since PG12)

2023-06-22 Thread Tomas Vondra



On 6/22/23 13:07, Alexander Pyhalov wrote:
> Tomas Vondra писал 2023-05-25 17:41:
> 
>> The attached patch does this - I realized we actually have estate in
>> ExecGetAllUpdatedCols(), so we don't even need a variant with a
>> different signature. That makes the patch much simpler.
>>
>> The question is whether we need the signature anyway. There might be a
>> caller expecting the result to be in CurrentMemoryContext (be it
>> ExecutorState or something else), and this change might break it. But
>> I'm not aware of any callers, so maybe that's fine.
>>
> 
> Hi.
> Unfortunately, this patch has broken trigdata->tg_updatedcols usage in
> AFTER UPDATE triggers.
> Should it be if not fixed, but at least mentioned in documentation?
> 

That's clearly a bug, we can't just break stuff like that.

> Attaching sample code. After creating trigger, an issue can be
> reproduced as this:
> 
> create table test (i int, j int);
> create function report_update_fields() RETURNS TRIGGER AS
> '/location/to/trig_test.so' language C;
> create trigger test_update after update ON test FOR EACH ROW EXECUTE
> FUNCTION report_update_fields();
> insert into test values (1, 12);
> update test set j=2;
> 

I haven't tried the reproducer, but I think I see the issue - we store
the bitmap as part of the event to be executed later, but the bitmap is
in per-tuple context and gets reset. So I guess we need to copy it into
the proper long-lived context (e.g. AfterTriggerEvents).

I'll get that fixed.

Anyway, this probably hints we should not recalculate the bitmap over
and over, but calculate it once and keep it for all events. Not
something we can do in backbranches, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Partial aggregates pushdown

2023-06-22 Thread Bruce Momjian
On Thu, Jun 22, 2023 at 05:23:33AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Approach1-3:
> I will add a postgres_fdw option "check_partial_aggregate_support".
> This option is false, default.
> Only if this option is true, postgres_fdw connect to the remote server and 
> get the version of the remote server.
> And if the version of the remote server is less than PG17, then partial 
> aggregate push down to the remote server is disable.

Great!

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: memory leak in trigger handling (since PG12)

2023-06-22 Thread Alexander Pyhalov

Tomas Vondra писал 2023-05-25 17:41:


The attached patch does this - I realized we actually have estate in
ExecGetAllUpdatedCols(), so we don't even need a variant with a
different signature. That makes the patch much simpler.

The question is whether we need the signature anyway. There might be a
caller expecting the result to be in CurrentMemoryContext (be it
ExecutorState or something else), and this change might break it. But
I'm not aware of any callers, so maybe that's fine.



Hi.
Unfortunately, this patch has broken trigdata->tg_updatedcols usage in 
AFTER UPDATE triggers.

Should it be if not fixed, but at least mentioned in documentation?

Attaching sample code. After creating trigger, an issue can be 
reproduced as this:


create table test (i int, j int);
create function report_update_fields() RETURNS TRIGGER AS 
'/location/to/trig_test.so' language C;
create trigger test_update after update ON test FOR EACH ROW EXECUTE 
FUNCTION report_update_fields();

insert into test values (1, 12);
update test set j=2;

--
Best regards,
Alexander Pyhalov,
Postgres Professional#include 
#include 
#include 
#include 
#include 

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif


void _PG_init()
{
}

PG_FUNCTION_INFO_V1(report_update_fields);

Datum
report_update_fields(PG_FUNCTION_ARGS)
{
	TriggerData *trigdata = (TriggerData *) fcinfo->context;
	int			tgnargs,
ret;
	char	   *rgids,
			   *gtname,
			   *nspname,
			   *rname;
	HeapTuple	rettuple = NULL;
	List	   *rgidlist = NIL;
	ListCell   *lc;

	if (!CALLED_AS_TRIGGER(fcinfo) || /*!TRIGGER_FIRED_AFTER(trigdata->tg_event) || */ TRIGGER_FIRED_FOR_STATEMENT(trigdata->tg_event) || !TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
		elog(ERROR, "usage unsupported");
	
	if ((trigdata->tg_trigtuple != NULL) && (trigdata->tg_newtuple != NULL))
	{
int attnum = -1;
int num_upd_attrs = 0;

while ((attnum = bms_next_member(trigdata->tg_updatedcols, attnum)) >= 0)
if (attnum + FirstLowInvalidHeapAttributeNumber > 0)
num_upd_attrs++;

elog(INFO, "updated %d attrs", num_upd_attrs);
}
}



Re: Support logical replication of DDLs

2023-06-22 Thread Amit Kapila
On Tue, Jun 13, 2023 at 1:21 PM Michael Paquier  wrote:
>
> The patch is made of a lot of one-one mapping between enum structures
> and hardcoded text used in the JSON objects, making it something hard
> to maintain if a node field is added, removed or even updated into
> something else.  I have mentioned that during PGCon, but here is
> something more durable: why don't we generate more of this code
> automatically based on the structure of the nodes we are looking at?
>

As far as I understand, the main idea behind the generation of code
based on the structure of node is that in most such cases, we generate
a common functionality based on each structure element (like
"comparison", "copy", "read", "write", or "append to jumble" that
element). There are exceptions to it in some cases in which we deal
with pg_node_attr annotations. However, the deparsing is different in
the sense that in many cases, there is no one-to-one mapping between
elements of structure and DDL's deparse generation.
For example,
1. Annotating fields to access the catalog won't be sufficient, we
need to tell the catalog's field, operator, etc., and writing such
functions for access will vary based on the type of DDL command.
Additionally, we may call some extra functions to get the required
output. See RelationGetPartitionBound. We can probably someway
annotate the field to call particular functions.
2. For part of the DDL creation, we primarily need to rely on catalogs
where no struct field is used. For example, creating identity
(schema+relation name) in CreateStmt, and autogenerating column
information won't seem feasible just by annotating structure, see
deparse_TableElems_ToJsonb and friends. The other example is that when
deparsing the CREATE TABLE command, the persistence/of_type/relkind
need to be fetched from the Relation structure(get from
relation_open()). There are other similar cases.
3. Another challenge is that to allow the elements to be processed in
the correct format, we need to form the statement in a particular
order. So, adding fields in between structures requires a manual
change in the deparse functions. Basically, the current output of
deparser includes a format string that can be used to format the plain
DDL strings by well-defined sprintf-like expansion. The format string
looks like this:

"CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D
(%{table_elements:, }s) %{inherits}s %{partition_by} ..."

The syntax format depends on whether each syntax part is necessary or
not. (For example, for the non-partition table, it doesn't have the
"%{partition_by}" part). So, when deparsing, we need to append each
syntax part to the format string separately and each syntax part(like
%{..}) needs to be generated in the correct order (otherwise, we
cannot expand it to a DDL command). It would be difficult to
automatically generate the format string in the correct order from the
structure members because the structure members' order varies.
4. RangeVar's variable could be appended in one way for "Alter Table"
but another way for "Create Table". When used via AlterTableStmt, we
need it to append ONLY clause whereas we don't need it in CreateStmt
5. IndexStmt is used differently for Alter Subcommands. In
AddIndexConstraint, some of its elements are used for keys whereas it
is not at all used in AddIndex for some assert checks.
6. Then the catalog table is opened once and the required information
is used during the entire deparse of the statement. We may need to
think about that as well.

Having said that, we are still trying to write a patch to see how it
looks, which may help us to jointly evaluate if we can do anything
better.

-- 
With Regards,
Amit Kapila.




Re: Support logical replication of DDLs

2023-06-22 Thread shveta malik
On Wed, Jun 21, 2023 at 6:38 PM Jelte Fennema  wrote:
>
> (to be clear I only skimmed the end of this thread and did not look at
> all the previous messages)
>
> I took a quick look at the first patch (about deparsing table ddl) and
> it seems like this would also be very useful for a SHOW CREATE TABLE,
> like command. Which was suggested in this thread:
> https://www.postgresql.org/message-id/flat/CAFEN2wxsDSSuOvrU03CE33ZphVLqtyh9viPp6huODCDx2UQkYA%40mail.gmail.com
>
> On that thread I sent a patch with Citus its CREATE TABLE deparsing as
> starting point. It looks like this thread went quite a different route
> with some JSON intermediary representation. Still it might be useful
> to look at the patch with Citus its logic for some inspiration/copying
> things. I re-attached that patch here for ease of finding it.

Thank You for attaching the patch for our ease.
We rely on JSONB because of the flexibility it provides. It is easy to
be parsed/processed/transformed arbitrarily by the subscriber using
generic rules. It should be trivial to use a JSON tool to change
schema A to schema B in any arbitrary DDL command, and produce another
working DDL command without having to know how to write that command
specifically.
It helps in splitting commands as well. As an example, we may need to
split commands like "ALTER TABLE foo ADD COLUMN bar double precision
DEFAULT random();" so that random() have consistent values on
publisher and subscriber. It would be convenient to break commands via
deparsing approach rather than via plain string.

Above being said, show table command can be implemented from ddl
deparse code using below steps:
1) Deparsing to create JSONB format using deparsing API ddl_deparse_to_json.
2) Expanding it back to DDL command using expansion API
ddl_deparse_expand_command.

But these APIs rely on getting information from parse-tree. This is
because we need to construct complete DDL string and info like "IF NOT
EXISTS", "CONCURRENTLY" etc can not be obtained from pg_catalog. Even
if we think of getting rid of parsetree, it may hit the performance,
as it is more efficient for us to get info from parse-tree instead of
doing catalog-access for everything.

We will try to review your patch to see if there is anything which we
can adopt without losing performance and flexibility. Meanwhile if you
have any suggestions on our patch which can make your work simpler,
please do let us know. We can review that.

thanks
Shveta




Re: Skip collecting decoded changes of already-aborted transactions

2023-06-22 Thread Amit Kapila
On Wed, Jun 21, 2023 at 8:12 AM Masahiko Sawada  wrote:
>
> On Thu, Jun 15, 2023 at 7:50 PM Amit Kapila  wrote:
> >
> > On Tue, Jun 13, 2023 at 2:06 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Sun, Jun 11, 2023 at 5:31 AM Andres Freund  wrote:
> > > >
> > > > A separate issue is that TransactionIdDidAbort() can end up being very 
> > > > slow if
> > > > a lot of transactions are in progress concurrently. As soon as the clog
> > > > buffers are extended all time is spent copying pages from the kernel
> > > > pagecache.  I'd not at all be surprised if this changed causes a 
> > > > substantial
> > > > slowdown in workloads with lots of small transactions, where most 
> > > > transactions
> > > > commit.
> > > >
> > >
> > > Indeed. So it should check the transaction status less frequently. It
> > > doesn't benefit much even if we can skip collecting decoded changes of
> > > small transactions. Another idea is that we check the status of only
> > > large transactions. That is, when the size of decoded changes of an
> > > aborted transaction exceeds logical_decoding_work_mem, we mark it as
> > > aborted , free its changes decoded so far, and skip further
> > > collection.
> > >
> >
> > Your idea might work for large transactions but I have not come across
> > reports where this is reported as a problem. Do you see any such
> > reports and can we see how much is the benefit with large
> > transactions? Because we do have the handling of concurrent aborts
> > during sys table scans and that might help sometimes for large
> > transactions.
>
> I've heard there was a case where a user had 29 million deletes in a
> single transaction with each one wrapped in a savepoint and rolled it
> back, which led to 11TB of spill files. If decoding such a large
> transaction fails for some reasons (e.g. a disk full), it would try
> decoding the same transaction again and again.
>

I was thinking why the existing handling of concurrent aborts doesn't
handle such a case and it seems that we check that only on catalog
access. However, in your case, the user probably is accessing the same
relation without any concurrent DDL on the same table, so it would
just be a cache look-up for catalogs. Your idea of checking aborts
every logical_decoding_work_mem should work for such cases.

-- 
With Regards,
Amit Kapila.




Re: Making empty Bitmapsets always be NULL

2023-06-22 Thread Yuya Watari
Hello,

On Tue, Jun 20, 2023 at 1:17 PM David Rowley  wrote:
> I've adjusted the attached patch to do that.

Thank you for updating the patch. The v4 patch looks good to me.

I ran another experiment. In the experiment, I issued queries of the
Join Order Benchmark [1] and measured its planning times. The
following table shows the result. The v4 patch obtained outstanding
performance improvements in planning time. This result supports the
effectiveness of the patch in real workloads.

Table 1: Planning time and its speedup of Join Order Benchmark
(n: the number of partitions of each table)
(Speedup: higher is better)

   n | Speedup (v4)

   2 |   102.4%
   4 |   101.0%
   8 |   101.6%
  16 |   103.1%
  32 |   107.5%
  64 |   115.7%
 128 |   142.9%
 256 |   187.7%


[1] https://github.com/winkyao/join-order-benchmark

-- 
Best regards,
Yuya Watari




Re: Making empty Bitmapsets always be NULL

2023-06-22 Thread Yuya Watari
Hello,

On Thu, Jun 22, 2023 at 1:43 PM David Rowley  wrote:
> > 3. Avoid enlargement when nwords is equal wordnum.
> > Can save cycles when in corner cases?
>
> No, you're just introducing a bug here.  Arrays in C are zero-based,
> so "wordnum >=  a->nwords" is exactly the correct way to check if
> wordnum falls outside the bounds of the existing allocated memory. By
> changing that to "wordnum > a->nwords" we'll fail to enlarge the words
> array when it needs to be enlarged by 1 element.

I agree with David. Unfortunately, some of the regression tests failed
with the v5 patch. These failures are due to the bug introduced by the
#3 change.

-- 
Best regards,
Yuya Watari




Re: Remove deprecation warnings when compiling PG ~13 with OpenSSL 3.0~

2023-06-22 Thread Michael Paquier
On Thu, Jun 22, 2023 at 10:02:58AM +0200, Daniel Gustafsson wrote:
> These patches LGTM from reading,

Thanks for double-checking.

> but I think the Discussion link in the commit
> messages should refer to this thread as well.

Of course.
--
Michael


signature.asc
Description: PGP signature


Re: Remove deprecation warnings when compiling PG ~13 with OpenSSL 3.0~

2023-06-22 Thread Daniel Gustafsson
> On 22 Jun 2023, at 01:53, Michael Paquier  wrote:

> I have tested the attached patches across 11~13 with various versions
> of OpenSSL (OPENSSL_API_COMPAT exists since 1.1.0), and this is
> working here.  Note that I don't have a MSVC environment at hand to
> test this change on Windows, still `perl -cw Solution.pm` is OK with
> it.

These patches LGTM from reading, but I think the Discussion link in the commit
messages should refer to this thread as well.

--
Daniel Gustafsson





Re: Allow pg_archivecleanup to remove backup history files

2023-06-22 Thread Kyotaro Horiguchi
At Wed, 21 Jun 2023 23:41:33 +0900, torikoshia  
wrote in 
> --v10-0002-Preliminary-refactoring-for-a-subsequent-patch.patch
> + * Also we skip backup history files when --clean-backup-history
> +* is not specified.
> +*/
> + if (!IsXLogFileName(walfile) && !IsPartialXLogFileName(walfile))
> +   continue;
> 
> I think this comment should be located in 0003.

Auch! Right.

> Attached updated patches.

Thanks!


0001:

Looks good to me.

0002:

+* Check file name.
+*
+* We skip files which are not WAL file or partial WAL file.

There's no need to spend this many lines describing this, and it's
suggested to avoid restating what the code does. I think a comment
might not be necessary. But if we decide to have one, it could look
like this:

>/* we're only removing specific types of files */

Other than that, it looks good to me.


0003:
+   
+ Remove backup history files.

I might be overthinking it, but I'd phrase it as, "Remove backup
history files *as well*.".  (The --help message describes the same
thing using "including".)



+ For details about backup history file, please refer to the .

We usually write this as simple as "See  for details (of the
backup history files)" or "Refer to  for more information
(about the backup history files)." or such like... (I think)



+bool   cleanBackupHistory = false; /* remove files including
+   
 * backup history files */

The indentation appears to be inconsistent.


-* Truncation is essentially harmless, because we skip names of
-* length other than XLOG_FNAME_LEN.  (In principle, one could 
use
-* a 1000-character additional_ext and get trouble.)
+* Truncation is essentially harmless,  because we check the 
file
+* format including the length immediately after this.
+* (In principle, one could use a 1000-character additional_ext
+* and get trouble.)
 */
strlcpy(walfile, xlde->d_name, MAXPGPATH);
TrimExtension(walfile, additional_ext);

The revised comment seems to drift from the original point. Except for
a potential exception by a too-long addition_ext, the original comment
asserted that the name truncation was safe since it wouldn't affect
the files we're removing. In other words, it asserted that the
filenames to be removed won't be truncated and any actual truncation
wouldn't lead to accidental deletions.

Hence, I think we should adjust the comment to maintain its original
point, and extend it to include backup history files. A possible
revision could be (very simple):

>* Truncation is essentially harmless, because we skip names of 
> length
>* longer than the length of backup history file. (In 
> principle, one
>* could use a 1000-character additional_ext and get trouble.)


Regarding the TAP test, it checks that the --clean-backup-history does
indeed remove backup history files. However, it doesn't check that
this doesn't occur if the option isn't specified. Shouldn't we test
for the latter scenario as well?


+sub get_walfiles
+{

+
+create_files(get_walfiles(@walfiles_with_gz));

The new function get_walfiels() just increases the line count without
cutting any lines. The following changes are sufficient and easier to
read (at least for me).

>   open my $file, '>', "$tempdir/$fn->{name}";

>   foreach my $fn (map {$_->{name}} @walfiles_with_gz)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Can JoinFilter condition be pushed down into IndexScan?

2023-06-22 Thread Bəxtiyar Neyman
Thanks, Tomas!

> I know, but it makes them harder to read for people. If you want people
> to respond it's generally a good idea to make it easy to understand the
> question. Don't make them waste their time - they'll just skip the
> message entirely.

Fair point.


> So, the optimizer clearly believes the subquery case has cost 9.15,
> while the inner join case costs 2.15. So it believes the plan is
> "cheaper" than the subquery. So even if it knew how to do the
> transformation / build the other plan (which I'm not sure it can), it
> probably wouldn't do it.

> AFAICS there's no chance to make this bit smarter until the estimates
> get much better to reality.

Got it. Thanks. I guess we'll have to emit correlated subqueries/CTEs.

Sincerely,
Bakhtiyar

On Wed, Jun 21, 2023 at 12:58 PM Tomas Vondra 
wrote:

> On 6/21/23 20:37, Bəxtiyar Neyman wrote:
> > Thanks Tomas for the lengthy write-up!
> >
> > Pardon the noise in the queries (LATERAL, AND true etc): they were
> > autogenerated by the library we wrote.
> >
>
> I know, but it makes them harder to read for people. If you want people
> to respond it's generally a good idea to make it easy to understand the
> question. Don't make them waste their time - they'll just skip the
> message entirely.
>
> >> Because those queries are not doing the same thing. In the first query
> >> you sort by t3_0 columns, while the "id = 4732455" condition is on the
> >> other table. And so it can't use the index scan for sorting.
> >>
> >> While in the second query it can do that, and it doesn't need to do the
> >> explicit sort (which needs to fetch all the rows etc.).
> >
> > Let me try to explain what both of my queries do:
> > 1) Get the rank of the user using its id (id = 4732455 in this example,
> > but it could have been one that exists, e.g. id = 500). This is LATERAL
> > t3_1 in the first query and subquery in the WHERE clause of the second
> > query.
> > 2) Using that rank, get the next 10 users by rank. This is t3_0.
> >
> > Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC,
> > t3_1."id" DESC" as you suggest, because then the order of returned rows
> > will not be guaranteed. In fact, such a clause will have no effect
> > because there is going to be at most one row supplied by t3_1 anyway.
> >
>
> Ah, OK. I got this wrong.
>
> > My question thus still stands. The planner knows that t3_1 has at most
> > one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it
> > doesn't figure out that it could have lowered the Join Filter condition
> > from the first plan as an Index Cond of the Index Scan of t3_1. Is there
> > a fundamental reason for this, or is this something worth improving in
> > the planner?
> >
>
> As I tried to explain before, I don't think the problem is in the
> planner not being able to do this transformation, but more likely in not
> being able to cost it correctly.
>
> Consider this (with 1M rows in the user_ranks table):
>
> 1) subquery case
> =
>
>  Limit  (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037
> rows=10 loops=1)
>Output: t3_0.id, t3_0.rank
>InitPlan 1 (returns $0,$1)
>  ->  Index Scan using user_ranks_pkey on public.user_ranks t4_0
> (cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
>Output: t4_0.rank, t4_0.id
>Index Cond: (t4_0.id = 33)
>->  Index Only Scan Backward using "by (rank, id)" on
> public.user_ranks t3_0  (cost=0.42..9493.75 rows=33 width=8) (actual
> time=0.031..0.033 rows=10 loops=1)
>  Output: t3_0.id, t3_0.rank
>  Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
>  Heap Fetches: 0
>  Planning Time: 0.072 ms
>  Execution Time: 0.055 ms
> (12 rows)
>
>
> 2) join
> ===
>
>  Limit  (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672
> rows=10 loops=1)
>Output: t3_0.id, t3_0.rank
>->  Nested Loop  (cost=0.85..43488.87 rows=33 width=8) (actual
> time=464.660..464.667 rows=10 loops=1)
>  Output: t3_0.id, t3_0.rank
>  Inner Unique: true
>  Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id))
>  Rows Removed by Join Filter: 67
>  ->  Index Only Scan Backward using "by (rank, id)" on
> public.user_ranks t3_0  (cost=0.42..25980.42 rows=100 width=8)
> (actual time=0.015..93.703 rows=77 loops=1)
>Output: t3_0.rank, t3_0.id
>Heap Fetches: 0
>  ->  Materialize  (cost=0.42..8.45 rows=1 width=8) (actual
> time=0.000..0.000 rows=1 loops=77)
>Output: t4_0.rank, t4_0.id
>->  Index Scan using user_ranks_pkey on public.user_ranks
> t4_0  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1
> loops=1)
>  Output: t4_0.rank, t4_0.id
>  Index Cond: (t4_0.id = 33)
>  Planning Time: 0.092 ms
>  Execution Time: 464.696 ms
> (17 rows)
>
>
> 3) join (with LEFT 

Re: Can JoinFilter condition be pushed down into IndexScan?

2023-06-22 Thread Bəxtiyar Neyman
Thanks Tomas for the lengthy write-up!

Pardon the noise in the queries (LATERAL, AND true etc): they were
autogenerated by the library we wrote.

> Because those queries are not doing the same thing. In the first query
> you sort by t3_0 columns, while the "id = 4732455" condition is on the
> other table. And so it can't use the index scan for sorting.
>
> While in the second query it can do that, and it doesn't need to do the
> explicit sort (which needs to fetch all the rows etc.).

Let me try to explain what both of my queries do:
1) Get the rank of the user using its id (id = 4732455 in this example, but
it could have been one that exists, e.g. id = 500). This is LATERAL t3_1 in
the first query and subquery in the WHERE clause of the second query.
2) Using that rank, get the next 10 users by rank. This is t3_0.

Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC,
t3_1."id" DESC" as you suggest, because then the order of returned rows
will not be guaranteed. In fact, such a clause will have no effect because
there is going to be at most one row supplied by t3_1 anyway.

My question thus still stands. The planner knows that t3_1 has at most one
row, and it knows that t3_0 can produce up to 5000 rows. Yet, it doesn't
figure out that it could have lowered the Join Filter condition from the
first plan as an Index Cond of the Index Scan of t3_1. Is there a
fundamental reason for this, or is this something worth improving in the
planner?

Sincerely,
Bakhtiyar

On Wed, Jun 21, 2023 at 6:28 AM Tomas Vondra 
wrote:

>
> On 6/21/23 05:37, Bəxtiyar Neyman wrote:
> > I define a table user_ranks as such:
> >
> > CREATE TABLE user_ranks (
> >   id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
> >   rank INTEGER NOT NULL,
> >   CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
> > );
> >
> > INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 1),
> > generate_series(1, 1);
> >
>
> This doesn't work, the INSERT needs to only insert into (rank).
>
> > Here's a query I'd like to optimize:
> >
> > explain (analyze,verbose)
> > SELECT
> >   t3_0."id" AS "id",
> >   t3_0."rank" AS "rank"
> > FROM
> >   LATERAL (
> > SELECT
> >   t4_0."rank" AS "rank"
> > FROM
> >   user_ranks AS t4_0
> > WHERE
> >   (t4_0."id" = 4732455)
> >   ) AS t3_1
> >   INNER JOIN user_ranks AS t3_0 ON true
> > WHERE
> >   (
> > ((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
> > AND true
> >   )
> > ORDER BY
> >   t3_0."rank" DESC,
> >   t3_0."id" DESC
> > LIMIT
> >   10
> >
>
> Not sure why you make the query unnecessarily complicated - the LATERAL
> is pointless I believe, the "AND true" just make it harder to read.
> Let's rewrite it it like this to make discussion easier:
>
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   user_ranks AS t3_1
>   INNER JOIN user_ranks AS t3_0
>   ON ((t3_0."rank", t3_0."id") <= (t3_1."rank", t3_1."id"))
> WHERE
>   t3_1."id" = 4732455
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
>
> Same query, but perhaps easier to read.
>
> > It compiles to the following plan:
> >
> >  Limit  (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078
> > rows=1 loops=1)
> >Output: t3_0.id , t3_0.rank
> >->  Nested Loop  (cost=0.56..41763.27 rows=1668 width=12) (actual
> > time=8.075..8.076 rows=1 loops=1)
> >  Output: t3_0.id , t3_0.rank
> >  Inner Unique: true
> >  Join Filter: (ROW(t3_0.rank, t3_0.id ) <=
> > ROW(t4_0.rank, 4732455))
> >  Rows Removed by Join Filter: 5002
> >  ->  Index Only Scan Backward using "by (rank,id)" on
> > public.user_ranks t3_0  (cost=0.28..163.33 rows=5003 width=12) (actual
> > time=0.023..0.638 rows=5003 loops=1)
> >Output: t3_0.rank, t3_0.id 
> >Heap Fetches: 0
> >  ->  Index Scan using "by id" on public.user_ranks t4_0
> >  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1
> > loops=5003)
> >Output: t4_0.id , t4_0.rating, t4_0.rank
> >Index Cond: (t4_0.id  = 4732455)
> >
> > As you can see, there are a lot of rows returned by t3_0, which are then
> > filtered by Join Filter. But it would have been better if instead of the
> > filter, the  t3_0 table would have an Index Cond. Similar to how it
> > happens when a correlated subquery is used (or a CTE)
> >
> > explain (analyze,verbose)
> > SELECT
> >   t3_0."id" AS "id",
> >   t3_0."rank" AS "rank"
> > FROM
> >   user_ranks AS t3_0
> > WHERE
> >   (
> > ((t3_0."rank", t3_0."id") <= (
> > SELECT
> >   t4_0."rank" AS "rank",
> >   t4_0."id" AS "id"
> > FROM
> >   user_ranks AS t4_0
> > WHERE
> >   (t4_0."id" = 4732455)
> > ))
> > AND true
> >   )
> > ORDER BY
> >   t3_0."rank" DESC,
> >   t3_0."id" DESC
> > LIMIT
> >   10
> >

Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

2023-06-22 Thread Heikki Linnakangas

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

On 18/06/2023 21:27, Joe Conway wrote:
With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.


Well I was not proposing such a rule (trying to stay narrowly focused on
the demonstrated issue) but I suppose it might make sense. Anywhere we
use setlocale() we are depending on subsequent locale operations to use
the global locale. And uselocale(LC_GLOBAL_LOCALE) itself looks like it
ought to be pretty cheap.


The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?


That seems heavy handed


Yet I think that's exactly where this is heading. See this case (for 
gettext() rather than strerror()):


postgres=# set lc_messages ='sv_SE.UTF8';
SET
postgres=# this prints syntax error in Swedish;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^
postgres=# load 'plperl';
LOAD
postgres=# set lc_messages ='en_GB.utf8';
SET
postgres=# this *should* print syntax error in English;
FEL:  syntaxfel vid eller nära "this"
LINE 1: this *should* print syntax error in English;
^


I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.


I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.


Any shared library could do that, that's true. Any shared library could 
also call 'chdir'. But most shared libraries don't. I think it's the 
responsibility of the extension that loads the shared library, plperl in 
this case, to make sure it doesn't mess up the environment for the 
postgres backend.



Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2;   # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n");   # Locale-dependent output
$$;
NOTICE:  half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING:  could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
 to_char
---
Tuesday
(1 row)


Do you think that is because uselocale(LC_GLOBAL_LOCALE) pulls out the
rug from under perl?


libperl is fine in this case. But cache_locale_time() also calls 
setlocale(), and your patch didn't add the "uselocale(LC_GLOBAL_LOCALE)" 
there.


It's a valid concern that "uselocale(LC_GLOBAL_LOCALE)" could pull the 
rug from under perl. I tried to find issues like that, by calling 
locale-dependent functions in plperl, with SQL functions that call 
"uselocale(LC_GLOBAL_LOCALE)" via PGLC_localeconv() in between. But I 
couldn't find any case where the perl code would misbehave. I guess 
libperl calls uselocale() before any locale-dependent function, but I 
didn't look very closely.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX

2023-06-22 Thread Michael Paquier
On Wed, Jun 21, 2023 at 08:06:06PM -0700, Nathan Bossart wrote:
> On Thu, Jun 22, 2023 at 10:46:41AM +0900, Michael Paquier wrote:
>> -   /*
>> -* We already checked that the user has privileges to CLUSTER the
>> -* partitioned table when we locked it earlier, so there's no need to
>> -* check the privileges again here.
>> -*/
>> +   if (!cluster_is_permitted_for_relation(relid, GetUserId()))
>> +   continue;
>> I would add a comment here that this ACL recheck for the leaves is an
>> important thing to keep around as it impacts the case where the leaves
>> have a different owner than the parent, and the owner of the parent
>> clusters it.  The only place in the tests where this has an influence
>> is the isolation test cluster-conflict-partition.
> 
> Done.

+   /*
+* It's possible that the user does not have privileges to CLUSTER the
+* leaf partition despite having such privileges on the partitioned
+* table.  We skip any partitions which the user is not permitted to
+* CLUSTER.
+*/

Sounds good to me.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: [DOC] Update ALTER SUBSCRIPTION documentation v3

2023-06-22 Thread Amit Kapila
On Tue, Jun 20, 2023 at 9:02 AM Peter Smith  wrote:
>
> FYI - I have created and tested back-patches for Amit's v5 patch,
> going all the way to REL_10_STABLE.
>

Pushed. I haven't used PG10 patch as REL_10_STABLE is out of support now.

-- 
With Regards,
Amit Kapila.




Re: ProcessStartupPacket(): database_name and user_name truncation

2023-06-22 Thread Drouvot, Bertrand

Hi,

On 6/22/23 1:37 AM, Michael Paquier wrote:

On Wed, Jun 21, 2023 at 12:55:15PM -0700, Nathan Bossart wrote:

LGTM.  I think this can wait for v17 since the current behavior has been
around since 2001 and AFAIK this is the first report.  While it's arguably
a bug fix, the patch also breaks some cases that work today.


Agreed that anything discussed on this thread does not warrant a
backpatch.


Fully agree, the CF entry [1] has been tagged as "Target Version 17".

[1] https://commitfest.postgresql.org/43/4383/

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com