Re: Preventing non-superusers from altering session authorization

2023-07-08 Thread Nathan Bossart
On Sat, Jul 08, 2023 at 07:08:35PM -0400, Joseph Koshakow wrote:
> On Sat, Jul 8, 2023 at 6:09 PM Nathan Bossart 
> wrote:
> 
>>> I think the issue here is that if a session loses the ability to set
>>> their session authorization in the middle of a transaction, then
>>> rolling back the transaction may fail and cause the server to panic.
>>> That's probably what the deleted comment mean when it said:
>>>
 * It's OK because the check does not require catalog access and can't
 * fail during an end-of-transaction GUC reversion
>>
>> Yeah.  IIUC the ERROR longjmps to a block that calls AbortTransaction(),
>> which ERRORs again when resetting the session authorization, which causes
>> us to call AbortTransaction() again, etc., etc.

src/backend/utils/misc/README has the following relevant text:

Note that there is no provision for a failure result code.  assign_hooks
should never fail except under the most dire circumstances, since a 
failure
may for example result in GUC settings not being rolled back properly 
during
transaction abort.  In general, try to do anything that could 
conceivably
fail in a check_hook instead, and pass along the results in an "extra"
struct, so that the assign hook has little to do beyond copying the 
data to
someplace.  This applies particularly to catalog lookups: any required
lookups must be done in the check_hook, since the assign_hook may be
executed during transaction rollback when lookups will be unsafe.

> Everything seems to work fine if the privilege check is moved to
> check_session_authorization. Which is maybe what the comment meant
> instead of assign_session_authorization.

Ah, that does make more sense.

I think we should split this into two patches: one to move the permission
check to check_session_authorization() and another for the behavior change.
I've attached an attempt at the first one (that borrows heavily from your
latest patch).  AFAICT the only reason that the permission check lives in
SetSessionAuthorization() is because AuthenticatedUserIsSuperuser is static
to miscinit.c and doesn't have an accessor function.  I added one, but it
would probably just be removed by the following patch.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 4972f29cef6dfbb948e843517ce5ff413628c2f1 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 8 Jul 2023 21:35:03 -0700
Subject: [PATCH v4 1/1] move session auth permission check

---
 src/backend/commands/variable.c   | 23 +++
 src/backend/utils/init/miscinit.c | 29 ++---
 src/include/miscadmin.h   |  1 +
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index f0f2e07655..f8e308f1d0 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -846,6 +846,29 @@ check_session_authorization(char **newval, void **extra, GucSource source)
 
 	ReleaseSysCache(roleTup);
 
+	/*
+	 * Only a superuser may set auth ID to something other than himself.  Note
+	 * that in case of multiple SETs in a single session, the original
+	 * userid's superuserness is what matters.  But we set the GUC variable
+	 * is_superuser to indicate whether the *current* session userid is a
+	 * superuser.
+	 */
+	if (roleid != GetAuthenticatedUserId() &&
+		!GetAuthenticatedUserIsSuperuser())
+	{
+		if (source == PGC_S_TEST)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+	 errmsg("permission will be denied to set session authorization \"%s\"",
+			*newval)));
+			return true;
+		}
+		GUC_check_errcode(ERRCODE_INSUFFICIENT_PRIVILEGE);
+		GUC_check_errmsg("permission denied to set session authorization");
+		return false;
+	}
+
 	/* Set up "extra" struct for assign_session_authorization to use */
 	myextra = (role_auth_extra *) guc_malloc(LOG, sizeof(role_auth_extra));
 	if (!myextra)
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..f5548a0f47 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -582,6 +582,16 @@ GetAuthenticatedUserId(void)
 	return AuthenticatedUserId;
 }
 
+/*
+ * Return whether the authenticated user was superuser at connection start.
+ */
+bool
+GetAuthenticatedUserIsSuperuser(void)
+{
+	Assert(OidIsValid(AuthenticatedUserId));
+	return AuthenticatedUserIsSuperuser;
+}
+
 
 /*
  * GetUserIdAndSecContext/SetUserIdAndSecContext - get/set the current user ID
@@ -888,29 +898,10 @@ system_user(PG_FUNCTION_ARGS)
 
 /*
  * Change session auth ID while running
- *
- * Only a superuser may set auth ID to something other than himself.  Note
- * that in case of multiple SETs in a single session, the original userid's
- * superuserness is what matters.  But we set the GUC variable is_superuser
- * to indicate whether the *current* session userid is a 

Re: Autogenerate some wait events code and documentation

2023-07-08 Thread Michael Paquier
On Fri, Jul 07, 2023 at 01:49:24PM +0900, Michael Paquier wrote:
> Hmm.  If we go down this road I would make the choice of simplicity
> and remove entirely a column, then, generating the snakecase from the
> camelcase or vice-versa (say like a $string =~ s/([a-z]+)/$1_/g;),
> even if it means having slightly incompatible strings showing to the
> users. And I'd rather minimize the number of exceptions we need to
> handle in this automation (aka no exception rules for some keywords
> like "SSL" or "WAL", etc.).

After pondering more about that, the attached patch set does exactly
that.  Patch 0001 includes an update of the wait event names so as
these are more consistent with the enum elements generated.  With this
change, users can apply lower() or upper() across monitoring queries
and still get the same results as before.  An exception was the
message queue events, which the enums used "MQ" but the event names
used "MessageQueue", but this concerns only four lines of code in the
backend.  The newly-generated enum elements match with the existing
ones, except for MQ.

Patch 0002 introduces a set of simplifications for the format of
wait_event_names.txt:
- Removal of the first column for the enums.
- Removal of the quotes for the event name.  We have a single keyword
for these, so that's kind of annoying to cope with that for new
entries.
- Build of the enum elements using the event names, by applying a
rebuild as simple as that:
+   $waiteventenumname =~ s/([a-z])([A-Z])/$1_$2/g;
+   $waiteventenumname = uc($waiteventenumname);

Thoughts?
--
Michael
From 30dff3cc8ab825bc968ecfed13eecee6725f6a80 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sun, 9 Jul 2023 13:24:54 +0900
Subject: [PATCH 1/2] Rename wait events with more consistent camelcase style

This will help in the introduction of more simplifications with the
generation of wait event data using wait_event_names.txt.  The event
names used the same case-insensitive characters, hence applying lower()
or upper() to the monitoring queries allows the detection of the same
events as before this change.
---
 src/backend/libpq/pqmq.c  |  2 +-
 src/backend/storage/ipc/shm_mq.c  |  6 +-
 .../utils/activity/wait_event_names.txt   | 90 +--
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 39e77ef945..87b8570a42 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 			break;
 
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_PUT_MESSAGE);
+		 WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE);
 		ResetLatch(MyLatch);
 		CHECK_FOR_INTERRUPTS();
 	}
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d134a09a19..06d6b73527 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			 * cheaper than setting one that has been reset.
 			 */
 			(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-			 WAIT_EVENT_MQ_SEND);
+			 WAIT_EVENT_MESSAGE_QUEUE_SEND);
 
 			/* Reset the latch so we don't spin. */
 			ResetLatch(MyLatch);
@@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 		 * setting one that has been reset.
 		 */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_RECEIVE);
+		 WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
@@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle)
 
 		/* Wait to be signaled. */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_INTERNAL);
+		 WAIT_EVENT_MESSAGE_QUEUE_INTERNAL);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 7af1a61f95..e706dfd8f3 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -45,15 +45,15 @@
 Section: ClassName - WaitEventActivity
 
 WAIT_EVENT_ARCHIVER_MAIN	"ArchiverMain"	"Waiting in main loop of archiver process."
-WAIT_EVENT_AUTOVACUUM_MAIN	"AutoVacuumMain"	"Waiting in main loop of autovacuum launcher process."
-WAIT_EVENT_BGWRITER_HIBERNATE	"BgWriterHibernate"	"Waiting in background writer process, hibernating."
-WAIT_EVENT_BGWRITER_MAIN	"BgWriterMain"	"Waiting in main loop of background writer process."
+WAIT_EVENT_AUTOVACUUM_MAIN	"AutovacuumMain"	"Waiting in main loop of autovacuum launcher process."
+WAIT_EVENT_BGWRITER_HIBERNATE	"BgwriterHibernate"	"Waiting in background writer process, hibernating."
+WAIT_EVENT_BGWRITER_MAIN	"BgwriterMain"	

Re: Check lateral references within PHVs for memoize cache keys

2023-07-08 Thread David Rowley
On Sat, 8 Jul 2023 at 17:24, Paul A Jungwirth
 wrote:
> One adjacent thing I noticed is that when we renamed "Result Cache" to
> "Memoize" this bit of the docs in config.sgml got skipped (probably
> because of the line break):
>
> Hash tables are used in hash joins, hash-based aggregation, result
> cache nodes and hash-based processing of IN
> subqueries.
>
> I believe that should say "memoize nodes" instead. Is it worth
> correcting that as part of this patch? Or perhaps another one?

I just pushed a fix for this.  Thanks for reporting it.

David




Re: Initial Schema Sync for Logical Replication

2023-07-08 Thread Amit Kapila
On Wed, Jul 5, 2023 at 7:45 AM Masahiko Sawada  wrote:
>
> On Mon, Jun 19, 2023 at 5:29 PM Peter Smith  wrote:
> >
> > Hi,
> >
> > Below are my review comments for the PoC patch 0001.
> >
> > In addition,  the patch needed rebasing, and, after I rebased it
> > locally in my private environment there were still test failures:
> > a) The 'make check' tests fail but only in a minor way due to changes 
> > colname
> > b) the subscription TAP test did not work at all for me -- many errors.
>
> Thank you for reviewing the patch.
>
> While updating the patch, I realized that the current approach won't
> work well or at least has the problem with partition tables. If a
> publication has a partitioned table with publish_via_root = false, the
> subscriber launches tablesync workers for its partitions so that each
> tablesync worker copies data of each partition. Similarly, if it has a
> partition table with publish_via_root = true, the subscriber launches
> a tablesync worker for the parent table. With the current design,
> since the tablesync worker is responsible for both schema and data
> synchronization for the target table, it won't be possible to
> synchronize both the parent table's schema and partitions' schema.
>

I think one possibility to make this design work is that when
publish_via_root is false, then we assume that subscriber already has
parent table and then the individual tablesync workers can sync the
schema of partitions and their data. And when publish_via_root is
true, then the table sync worker is responsible to sync parent and
child tables along with data. Do you think such a mechanism can
address the partition table related cases?

-- 
With Regards,
Amit Kapila.




Re: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns

2023-07-08 Thread Zhang Mingli
HI,

Regards,
Zhang Mingli
On Jul 7, 2023, 18:00 +0800, Damir Belyalov , wrote:
>
> The patch does not work for the current version of postgres, it needs to be 
> updated.
> I tested your patch. Everything looks simple and works well.
>
> There is a suggestion to simplify the code: instead of using
>
> if (cstate->opts.force_notnull_all)
> {
> int i;
> for(i = 0; i < num_phys_attrs; i++)
> cstate->opt.force_notnull_flags[i] = true;
> }

Thanks very much for review.

Nice suggestion, patch rebased and updated.



v2-0001-COPY-FROM-enable-FORCE_NULL-FORCE_NOT_NULL-on-all-co.patch
Description: Binary data


Re: Exclusion constraints on partitioned tables

2023-07-08 Thread Paul A Jungwirth
On Thu, Jul 6, 2023 at 1:03 AM Peter Eisentraut  wrote:
> This looks all pretty good to me.  A few more comments:

Thanks for the feedback! New patch attached here. Responses below:

> It seems to me that many of the test cases added in indexing.sql are
> redundant with create_table.sql/alter_table.sql (or vice versa).  Is
> there a reason for this?

Yes, there is some overlap. I think that's just because there was
overlap before, and I didn't want to delete the old tests completely.
But since indexing.sql has a fuller list of tests and is a superset of
the others, this new patch removes the redundant tests from
{create,alter}_table.sql.

Btw speaking of tests, I want to make sure this new feature will still
work when you're using btree_gist and and `EXCLUDE WITH (myint =,
mytsrange &&)` (and not just `(myint4range =, mytsrange &&)`). Some of
my early attempts writing this patch worked w/o btree_gist but not w/
(or vice versa). But as far as I know there's no way to test that in
regress. I wound up writing a private shell script that just does
this:

```

-- test against btree_gist since we can't do that in the postgres
regress test suite:

CREATE EXTENSION btree_gist;

create table partitioned (id int, valid_at tsrange, exclude using gist
(id with =, valid_at with &&)) partition by range (id);
-- should fail with a good error message:
create table partitioned2 (id int, valid_at tsrange, exclude using
gist (id with <>, valid_at with &&)) partition by range (id);
```

Is there some place in the repo to include a test like that? It seems
a little funny to put it in the btree_gist suite, but maybe that's the
right answer.

> This is not really a problem in your patch, but I think in
>
> -   if (partitioned && (stmt->unique || stmt->primary))
> +   if (partitioned && (stmt->unique || stmt->primary ||
> stmt->excludeOpNames != NIL))
>
> the stmt->primary is redundant and should be removed.  Right now
> "primary" is always a subset of "unique", but presumably a future patch
> of yours wants to change that.

Done! I don't think my temporal work changes that primary ⊆ unique. It
does change that some primary/unique constraints will have non-null
excludeOpNames, which will require small changes here eventually. But
that should be part of the temporal patches, not this one.

> Furthermore, I think it would be more elegant in your patch if you wrote
> stmt->excludeOpNames without the "== NIL" or "!= NIL", so that it
> becomes a peer of stmt->unique.  (I understand some people don't like
> that style.  But it is already used in that file.)

Done.

> I would consider rearranging some of the conditionals more as a
> selection of cases, like "is it a unique constraint?", "else, is it an
> exclusion constraint?" -- rather than the current "is it an exclusion
> constraint?, "else, various old code".

Done.

> Also, I would push the code
>
>  if (accessMethodId == BTREE_AM_OID)
>  eq_strategy = BTEqualStrategyNumber;
>
> further down into the loop, so that you don't have to remember in which
> cases eq_strategy is assigned or not.
>
> (It's also confusing that the eq_strategy variable is used for two
> different things in this function, and that would clean that up.)

Agreed that it's confusing. Done.

> Finally, this code
>
> +   att = TupleDescAttr(RelationGetDescr(rel),
> +   key->partattrs[i] - 1);
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot match partition key
> to index on column \"%s\" using non-equal operator \"%s\".",
> +   NameStr(att->attname),
> get_opname(indexInfo->ii_ExclusionOps[j];
>
> could be simplified by using get_attname().

Okay, done. I changed the similar error message just below too.

> This is all just a bit of polishing.  I think it would be good to go
> after that.

Thanks!

-- 
Paul  ~{:-)
p...@illuminatedcomputing.com
From ca3a54d78dce9b2d553f37e769ccc65fbf579f42 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Wed, 23 Nov 2022 14:55:43 -0800
Subject: [PATCH v4] Allow some exclusion constraints on partitions

Previously we only allowed UNIQUE B-tree constraints on partitions
(and only if the constraint included all the partition keys). But we
could allow exclusion constraints with the same restriction. We also
require that those columns be compared for equality, not something like
&&.
---
 doc/src/sgml/ddl.sgml  | 12 ++--
 src/backend/commands/indexcmds.c   | 63 ++-
 src/backend/parser/parse_utilcmd.c |  6 --
 src/test/regress/expected/alter_table.out  |  7 +--
 src/test/regress/expected/create_table.out |  8 ---
 src/test/regress/expected/indexing.out | 73 ++
 src/test/regress/sql/alter_table.sql   |  5 +-
 

Re: Check lateral references within PHVs for memoize cache keys

2023-07-08 Thread David Rowley
On Sun, 9 Jul 2023 at 05:28, Tom Lane  wrote:
> More generally, it's not clear to me why we should need to look inside
> lateral PHVs in the first place.  Wouldn't the lateral PHV itself
> serve fine as a cache key?

For Memoize specifically, I purposefully made it so the expression was
used as a cache key rather than extracting the Vars from it and using
those.  The reason for that was that the expression may result in
fewer distinct values to cache tuples for. For example:

create table t1 (a int primary key);
create table t2 (a int primary key);

create statistics on (a % 10) from t2;
insert into t2 select x from generate_Series(1,100)x;
insert into t1 select x from generate_Series(1,100)x;

analyze t1,t2;
explain (analyze, costs off) select * from t1 inner join t2 on t1.a=t2.a%10;
 QUERY PLAN

 Nested Loop (actual time=0.015..212.798 rows=90 loops=1)
   ->  Seq Scan on t2 (actual time=0.006..33.479 rows=100 loops=1)
   ->  Memoize (actual time=0.000..0.000 rows=1 loops=100)
 Cache Key: (t2.a % 10)
 Cache Mode: logical
 Hits: 90  Misses: 10  Evictions: 0  Overflows: 0  Memory Usage: 1kB
 ->  Index Only Scan using t1_pkey on t1 (actual
time=0.001..0.001 rows=1 loops=10)
   Index Cond: (a = (t2.a % 10))
   Heap Fetches: 0
 Planning Time: 0.928 ms
 Execution Time: 229.621 ms
(11 rows)




Re: check_strxfrm_bug()

2023-07-08 Thread Thomas Munro
On Sat, Jul 8, 2023 at 10:13 AM Thomas Munro  wrote:
> Thanks.  I will wait a bit to see if Peter has any other comments and
> then push this.  I haven't actually tested with Solution.pm due to
> lack of CI for that, but fingers crossed, since the build systems will
> now agree, reducing the screw-up surface.

Done.  Let's see what the build farm thinks...




Re: Preventing non-superusers from altering session authorization

2023-07-08 Thread Joseph Koshakow
On Sat, Jul 8, 2023 at 6:09 PM Nathan Bossart 
wrote:

>> I think the issue here is that if a session loses the ability to set
>> their session authorization in the middle of a transaction, then
>> rolling back the transaction may fail and cause the server to panic.
>> That's probably what the deleted comment mean when it said:
>>
>>> * It's OK because the check does not require catalog access and can't
>>> * fail during an end-of-transaction GUC reversion
>
> Yeah.  IIUC the ERROR longjmps to a block that calls AbortTransaction(),
> which ERRORs again when resetting the session authorization, which causes
> us to call AbortTransaction() again, etc., etc.

Everything seems to work fine if the privilege check is moved to
check_session_authorization. Which is maybe what the comment meant
instead of assign_session_authorization.

I've attached a patch with this change.

Thanks,
Joe Koshakow
From cb0198524d96068079e301a6785301440f3be3aa Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Thu, 15 Jun 2023 14:53:11 -0400
Subject: [PATCH] Prevent non-superusers from altering session auth

Previously, if a user connected with as a role that had the superuser
attribute, then they could always execute a SET SESSION AUTHORIZATION
statement for the duration of their session. Even if the role was
altered to set superuser to false, the user was still allowed to
execute SET SESSION AUTHORIZATION. This allowed them to set their
session role to some other superuser and effectively regain the
superuser privileges. They could even reset their own superuser
attribute to true.

This commit alters the privilege checks for SET SESSION AUTHORIZATION
such that a user can only execute it if the role they connected with is
currently a superuser. This prevents users from regaining superuser
privileges after it has been revoked.
---
 doc/src/sgml/ref/set_session_auth.sgml |  2 +-
 src/backend/commands/variable.c| 13 +++-
 src/backend/utils/init/miscinit.c  | 28 ++
 3 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/ref/set_session_auth.sgml b/doc/src/sgml/ref/set_session_auth.sgml
index f8fcafc194..94adab2468 100644
--- a/doc/src/sgml/ref/set_session_auth.sgml
+++ b/doc/src/sgml/ref/set_session_auth.sgml
@@ -51,7 +51,7 @@ RESET SESSION AUTHORIZATION
 
   
The session user identifier can be changed only if the initial session
-   user (the authenticated user) had the
+   user (the authenticated user) has the
superuser privilege.  Otherwise, the command is accepted only if it
specifies the authenticated user name.
   
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index f0f2e07655..e2f47eceb7 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -803,7 +803,8 @@ check_session_authorization(char **newval, void **extra, GucSource source)
 {
 	HeapTuple	roleTup;
 	Form_pg_authid roleform;
-	Oid			roleid;
+	Oid			roleid,
+authenticated_user_id;
 	bool		is_superuser;
 	role_auth_extra *myextra;
 
@@ -846,6 +847,16 @@ check_session_authorization(char **newval, void **extra, GucSource source)
 
 	ReleaseSysCache(roleTup);
 
+	authenticated_user_id = GetAuthenticatedUserId();
+	/* Must have authenticated already, else can't make permission check */
+	Assert(OidIsValid(authenticated_user_id));
+
+	if (roleid != authenticated_user_id &&
+		!superuser_arg(authenticated_user_id))
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("permission denied to set session authorization")));
+
 	/* Set up "extra" struct for assign_session_authorization to use */
 	myextra = (role_auth_extra *) guc_malloc(LOG, sizeof(role_auth_extra));
 	if (!myextra)
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index a604432126..04e019df20 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -467,7 +467,7 @@ ChangeToDataDir(void)
  * AuthenticatedUserId is determined at connection start and never changes.
  *
  * SessionUserId is initially the same as AuthenticatedUserId, but can be
- * changed by SET SESSION AUTHORIZATION (if AuthenticatedUserIsSuperuser).
+ * changed by SET SESSION AUTHORIZATION (if AuthenticatedUserId is a superuser).
  * This is the ID reported by the SESSION_USER SQL function.
  *
  * OuterUserId is the current user ID in effect at the "outer level" (outside
@@ -492,8 +492,6 @@ static Oid	OuterUserId = InvalidOid;
 static Oid	CurrentUserId = InvalidOid;
 static const char *SystemUser = NULL;
 
-/* We also have to remember the superuser state of some of these levels */
-static bool AuthenticatedUserIsSuperuser = false;
 static bool SessionUserIsSuperuser = false;
 
 static int	SecurityRestrictionContext = 0;
@@ -731,6 +729,7 @@ InitializeSessionUserId(const char *rolename, Oid roleid)
 	HeapTuple	roleTup;
 	Form_pg_authid rform;
 	char	   *rname;
+	bool	   is_superuser;
 
 	/*
 	 * Don't do scans if 

Re: Preventing non-superusers from altering session authorization

2023-07-08 Thread Nathan Bossart
On Sat, Jul 08, 2023 at 04:44:06PM -0400, Joseph Koshakow wrote:
> 2023-07-08 16:33:27.787 EDT [157141] PANIC:  ERRORDATA_STACK_SIZE exceeded
> 2023-07-08 16:33:27.882 EDT [156878] LOG:  server process (PID 157141) was
> terminated by signal 6: Aborted
> 2023-07-08 16:33:27.882 EDT [156878] DETAIL:  Failed process was running:
> CREATE TABLE t ();
> 
> I think the issue here is that if a session loses the ability to set
> their session authorization in the middle of a transaction, then
> rolling back the transaction may fail and cause the server to panic.
> That's probably what the deleted comment mean when it said:
> 
>> * It's OK because the check does not require catalog access and can't
>> * fail during an end-of-transaction GUC reversion

Yeah.  IIUC the ERROR longjmps to a block that calls AbortTransaction(),
which ERRORs again when resetting the session authorization, which causes
us to call AbortTransaction() again, etc., etc.

> Interestingly, if the r1 session manually types `ROLLBACK` instead of
> executing a command that fails, then everything is fine and there's no
> panic. I'm not familiar enough with transaction handling to know why
> there would be a difference there.

I haven't had a chance to dig into this one yet, but that is indeed
interesting.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

2023-07-08 Thread Heikki Linnakangas

On 02/07/2023 19:09, Tomas Vondra wrote:

Here's an updated version of the patch series.

I've polished and pushed the first three patches with cleanup, tests to
improve test coverage and so on. I chose not to backpatch those - I
planned to do that to make future backpatches simpler, but the changes
ended up less disruptive than expected.

The remaining patches are just about adding SK_SEARCHARRAY to BRIN.

0001 - adds the optional preprocess procedure, calls it from brinrescan

0002 to 0005 - adds the support to the existing BRIN opclasses


Could you implement this completely in the consistent-function, by 
caching the sorted array in fn_extra, without adding the new preprocess 
procedure? On first call, when fn_extra == NULL, sort the array and 
stash it in fn_extra.


I don't think that works, because fn_extra isn't reset when the scan 
keys change on rescan. We could reset it, and document that you can use 
fn_extra for per-scankey caching. There's some precedence for not 
resetting it though, see commit d22a09dc70f. But we could provide an 
opaque per-scankey scratch space like that somewhere else. In BrinDesc, 
perhaps.


The new preprocess support function feels a bit too inflexible to me. 
True, you can store whatever you want in the ScanKey that it returns, 
but since that's the case, why not just make it void * ?It seems that 
the constraint here was that you need to pass a ScanKey to the 
consistent function, because the consistent function's signature is what 
it is. But we can change the signature, if we introduce a new support 
amproc number for it.



The main open question I have is what exactly does it mean that the
procedure is optional. In particular, should it be supported to have a
BRIN opclass without the "preprocess" procedure but using the other
built-in support procedures?

For example, imagine you have a custom BRIN opclass in an extension (for
a custom data type or something). This does not need to implement any
procedures, it can just call the existing built-in ones. Of course, this
won't get the "preprocess" procedure automatically.

Should we support such opclasses or should we force the extension to be
updated by adding a preprocess procedure? I'd say "optional" means we
should support (otherwise it'd not really optional).

The reason why this matters is that "amsearcharray" is AM-level flag,
but the support procedure is defined by the opclass. So the consistent
function needs to handle SK_SEARCHARRAY keys both with and without
preprocessing.

That's mostly what I did for all existing BRIN opclasses (it's a bit
confusing that opclass may refer to both the "generic" minmax or the
opclass defined for a particular data type). All the opclasses now
handle three cases:

1) scalar keys (just like before, with amsearcharray=fase)

2) array keys with preprocessing (sorted array, array of hashes, ...)

3) array keys without preprocessing (for compatibility with old
opclasses missing the optional preprocess procedure)

The current code is a bit ugly, because it duplicates a bunch of code,
because the option (3) mostly does (1) in a loop. I'm confident this can
be reduced by refactoring and reusing some of the "shared" code.

The question is if my interpretation of what "optional" procedure means
is reasonable. Thoughts?

The other thing is how to test this "compatibility" code. I assume we
want to have the procedure for all built-in opclasses, so that won't
exercise it. I did test it by temporarily removing the procedure from a
couple pg_amproc.dat entries. I guess creating a custom opclass in the
regression test is the right solution.


It would be unpleasant to force all BRIN opclasses to immediately 
implement the searcharray-logic. If we don't want to do that, we need to 
implement generic SK_SEARCHARRAY handling in BRIN AM itself. That would 
be pretty easy, right? Just call the regular consistent function for 
every element in the array.


If an opclass wants to provide a faster/better implementation, it can 
provide a new consistent support procedure that supports that. Let's 
assign a new amproc number for new-style consistent function, which must 
support SK_SEARCHARRAY, and pass it some scratch space where it can 
cache whatever per-scankey data. Because it gets a new amproc number, we 
can define the arguments as we wish. We can pass a pointer to the 
per-scankey scratch space as a new argument, for example.


We did this backwards-compatibility dance with the 3/4-argument variants 
of the current consistent functions. But I think assigning a whole new 
procedure number is better than looking at the number of arguments.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Reducing connection overhead in pg_upgrade compat check phase

2023-07-08 Thread Nathan Bossart
Thanks for the new patch.

On Thu, Jul 06, 2023 at 05:58:33PM +0200, Daniel Gustafsson wrote:
>> On 4 Jul 2023, at 21:08, Nathan Bossart  wrote:
>> Also, I think it would be worth breaking check_for_data_types_usage() into
>> a few separate functions (or doing some other similar refactoring) to
>> improve readability.  At this point, the function is quite lengthy, and I
>> count 6 levels of indentation at some lines.
> 
> 
> It it is pretty big for sure, but it's also IMHO not terribly complicated as
> it's not really performing any hard to follow logic.
> 
> I have no issues refactoring it, but trying my hand at I was only making (what
> I consider) less readable code by having to jump around so I consider it a
> failure.  If you have any suggestions, I would be more than happy to review 
> and
> incorporate those though.

I don't have a strong opinion about this.

+   for (int rowno = 0; rowno < ntups; rowno++)
+   {
+   if (script == NULL && (script = 
fopen_priv(output_path, "a")) == NULL)
+   pg_fatal("could not open file 
\"%s\": %s",
+output_path,
+
strerror(errno));
+   if (!db_used)
+   {
+   fprintf(script, "In database: 
%s\n", active_db->db_name);
+   db_used = true;
+   }

Since "script" will be NULL and "db_used" will be false in the first
iteration of the loop, couldn't we move this stuff to before the loop?

+   fprintf(script, " %s.%s.%s\n",
+   PQgetvalue(res, rowno, 
i_nspname),
+   PQgetvalue(res, rowno, 
i_relname),
+   PQgetvalue(res, rowno, 
i_attname));

nitpick: І think the current code has two spaces at the beginning of this
format string.  Did you mean to remove one of them?

+   if (script)
+   {
+   fclose(script);
+   script = NULL;
+   }

Won't "script" always be initialized here?  If I'm following this code
correctly, I think everything except the fclose() can be removed.

+   cur_check++;

I think this is unnecessary since we assign "cur_check" at the beginning of
every loop iteration.  I see two of these.

+static int n_data_types_usage_checks = 7;

Can we determine this programmatically so that folks don't need to remember
to update it?

+   /* Prepare an array to store the results of checks in */
+   results = pg_malloc(sizeof(bool) * n_data_types_usage_checks);
+   memset(results, true, sizeof(*results));

IMHO it's a little strange that this is initialized to all "true", only
because I think most other Postgres code does the opposite.

+bool
+check_for_aclitem_data_type_usage(ClusterInfo *cluster)

Do you think we should rename these functions to something like
"should_check_for_*"?  They don't actually do the check, they just tell you
whether you should based on the version.  In fact, I wonder if we could
just add the versions directly to data_types_usage_checks so that we don't
need the separate hook functions.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: DecodeInterval fixes

2023-07-08 Thread Gurjeet Singh
On Sat, Jul 8, 2023 at 1:33 PM Tom Lane  wrote:
>
> Gurjeet Singh  writes:
> > On Fri, Jul 7, 2023 at 4:13 PM Tom Lane  wrote:
> >> The ECPG datetime datatype support code has been basically unmaintained
> >> for years, and has diverged quite far at this point from the core code.
>
> > I was under the impression that anything in the postgresql.git
> > repository is considered core, and hence maintained as one unit, from
> > release to release.
>
> When I say that ecpglib is next door to unmaintained, I'm just stating
> the facts on the ground; project policy is irrelevant.  That situation
> is not likely to change until somebody steps up to do (a lot of) work
> on it, which is probably unlikely to happen unless we start getting
> actual complaints from ECPG users.  For the meantime, what's there now
> seems to be satisfactory to whoever is using it ... which might be
> nobody?
>
> In any case, you don't have to look far to notice that some parts of
> the tree are maintained far more actively than others.  ecpglib is
> just one of the more identifiable bits that's receiving little love.
> The quality of the code under contrib/ is wildly variable, and even
> the server code itself has backwaters.  (For instance, the "bit" types,
> which aren't even in the standard anymore; or the geometric types,
> or "money".)

Thanks for sharing your view on different parts of the code. This give
a clear direction if someone would be interested in stepping up.

As part of my mentoring a GSoC 2023 participant, last night we were
looking at the TODO wiki page, for something for the mentee to pick up
next. I feel the staleness/deficiencies you mention above are not
captured in the TODO wiki page. It'd be nice if these were documented,
so that newcomers to the community can pick up work that they feel is
an easy lift for them.

> By and large, I don't see this unevenness of maintenance effort as
> a problem.  It's more like directing our limited resources into the
> most useful places.  Code that isn't getting worked on is either not
> used at all by anybody, or it serves the needs of those who use it
> well enough already.  Since it's difficult to tell which of those
> cases applies, removing code just because it's not been improved
> lately is a hard choice to sell.  But so is putting maintenance effort
> into code that there might be little audience for.  In the end we
> solve this via the principle of "scratch your own itch": if somebody
> is concerned enough about a particular piece of code to put their own
> time into improving it, then great, it'll get improved.
>
> > Benign neglect doesn't sound nice from a user's/consumer's
> > perspective. Can it be labeled (i.e. declared as such in docs) as
> > deprecated.
>
> Deprecation would imply that we're planning to remove it, which
> we are not.

Good to know. Sorry I took "benign neglect" to mean that there's no
willingness to improve it. This makes it clear that community wants to
improve and maintain ECPG, it's just a matter of someone volunteering,
and better use of the resources available.



Best regards,
Gurjeet
http://Gurje.et




Re: Preventing non-superusers from altering session authorization

2023-07-08 Thread Joseph Koshakow
I've discovered an issue with this approach. Let's say you have some
session open that is connected as a superuser and you run the following
commands:

  - CREATE ROLE r1 LOGIN SUPERUSER;
  - CREATE ROLE r2;
  - CREATE ROLE r3;

Then you open another session connected with user r1 and run the
following commands:

  - SET SESSION AUTHROIZATION r2;
  - BEGIN;
  - SET SESSION AUTHORIZATION r3;

Then in your original session run:

  - ALTER ROLE r1 NOSUPERUSER;

Finally in the r1 session run:

  - CREATE TABLE t ();

Postgres will then panic with the following logs:

2023-07-08 16:33:27.787 EDT [157141] ERROR:  permission denied for schema
public at character 14
2023-07-08 16:33:27.787 EDT [157141] STATEMENT:  CREATE TABLE t ();
2023-07-08 16:33:27.787 EDT [157141] ERROR:  permission denied to set
session authorization
2023-07-08 16:33:27.787 EDT [157141] WARNING:  AbortTransaction while in
ABORT state
2023-07-08 16:33:27.787 EDT [157141] ERROR:  permission denied to set
session authorization
2023-07-08 16:33:27.787 EDT [157141] WARNING:  AbortTransaction while in
ABORT state
2023-07-08 16:33:27.787 EDT [157141] ERROR:  permission denied to set
session authorization
2023-07-08 16:33:27.787 EDT [157141] WARNING:  AbortTransaction while in
ABORT state
2023-07-08 16:33:27.787 EDT [157141] ERROR:  permission denied to set
session authorization
2023-07-08 16:33:27.787 EDT [157141] PANIC:  ERRORDATA_STACK_SIZE exceeded
2023-07-08 16:33:27.882 EDT [156878] LOG:  server process (PID 157141) was
terminated by signal 6: Aborted
2023-07-08 16:33:27.882 EDT [156878] DETAIL:  Failed process was running:
CREATE TABLE t ();

I think the issue here is that if a session loses the ability to set
their session authorization in the middle of a transaction, then
rolling back the transaction may fail and cause the server to panic.
That's probably what the deleted comment mean when it said:

> * It's OK because the check does not require catalog access and can't
> * fail during an end-of-transaction GUC reversion

Interestingly, if the r1 session manually types `ROLLBACK` instead of
executing a command that fails, then everything is fine and there's no
panic. I'm not familiar enough with transaction handling to know why
there would be a difference there.

Thanks,
Joe Koshakow


Re: DecodeInterval fixes

2023-07-08 Thread Tom Lane
Gurjeet Singh  writes:
> On Fri, Jul 7, 2023 at 4:13 PM Tom Lane  wrote:
>> The ECPG datetime datatype support code has been basically unmaintained
>> for years, and has diverged quite far at this point from the core code.

> I was under the impression that anything in the postgresql.git
> repository is considered core, and hence maintained as one unit, from
> release to release.

When I say that ecpglib is next door to unmaintained, I'm just stating
the facts on the ground; project policy is irrelevant.  That situation
is not likely to change until somebody steps up to do (a lot of) work
on it, which is probably unlikely to happen unless we start getting
actual complaints from ECPG users.  For the meantime, what's there now
seems to be satisfactory to whoever is using it ... which might be
nobody?

In any case, you don't have to look far to notice that some parts of
the tree are maintained far more actively than others.  ecpglib is
just one of the more identifiable bits that's receiving little love.
The quality of the code under contrib/ is wildly variable, and even
the server code itself has backwaters.  (For instance, the "bit" types,
which aren't even in the standard anymore; or the geometric types,
or "money".)

By and large, I don't see this unevenness of maintenance effort as
a problem.  It's more like directing our limited resources into the
most useful places.  Code that isn't getting worked on is either not
used at all by anybody, or it serves the needs of those who use it
well enough already.  Since it's difficult to tell which of those
cases applies, removing code just because it's not been improved
lately is a hard choice to sell.  But so is putting maintenance effort
into code that there might be little audience for.  In the end we
solve this via the principle of "scratch your own itch": if somebody
is concerned enough about a particular piece of code to put their own
time into improving it, then great, it'll get improved.

> Benign neglect doesn't sound nice from a user's/consumer's
> perspective. Can it be labeled (i.e. declared as such in docs) as
> deprecated.

Deprecation would imply that we're planning to remove it, which
we are not.

regards, tom lane




Re: Cleaning up array_in()

2023-07-08 Thread Heikki Linnakangas

On 08/07/2023 22:49, Tom Lane wrote:

BTW, what's your opinion of allowing "[1:0]={}" ?  Although that was
my proposal to begin with, I'm having second thoughts about it now.
The main reason is that the input transformation would be lossy,
eg "[1:0]={}" and "[101:100]={}" would give the same results, which
seems a little ugly.


Hmm, yeah, that would feel wrong if you did something like this:

select ('[2:1]={}'::text[]) || '{x}'::text[];

and expected it to return '[2:2]={x}'.

I guess we could allow "[1:0]={}" as a special case, but not 
"[101:100]={}", but that would be weird too.



Given the lack of field complaints, maybe we should leave that
alone.
+1 to leave it alone. It's a little weird either way, so better to stay 
put. We can revisit it later if we want to, but I wouldn't want to go 
back and forth on it.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Cleaning up array_in()

2023-07-08 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08/07/2023 19:08, Tom Lane wrote:
>> That got sideswiped by ae6d06f09, so here's a trivial rebase to
>> pacify the cfbot.

> Something's wrong with your attachments.

Yeah, I forgot to run mhbuild :-(

> I spent some time today refactoring it for readability and speed. I 
> introduced a separate helper function to tokenize the input. It deals 
> with whitespace, escapes, and backslashes. Then I merged ArrayCount() 
> and ReadArrayStr() into one function that parses the elements and 
> determines the dimensions in one pass. That speeds up parsing large 
> arrays. With the tokenizer function, the logic in ReadArrayStr() is 
> still quite readable, even though it's now checking the dimensions at 
> the same time.

Oh, thanks for taking a look!

> I also noticed that we used atoi() to parse the integers in the 
> dimensions, which doesn't do much error checking.

Yup, I'd noticed that too but not gotten around to doing anything
about it.  I agree with nailing it down better as long as we're
tightening things in this area.

> Attached are your patches, rebased to fix the conflicts with ae6d06f09 
> like you intended. On top of that, my patches. My patches need more 
> testing, benchmarking, and review, so if we want to sneak something into 
> v16, better go with just your patches.

At this point I'm only proposing this for v17, so additional cleanup
is welcome.

BTW, what's your opinion of allowing "[1:0]={}" ?  Although that was
my proposal to begin with, I'm having second thoughts about it now.
The main reason is that the input transformation would be lossy,
eg "[1:0]={}" and "[101:100]={}" would give the same results, which
seems a little ugly.  Given the lack of field complaints, maybe we
should leave that alone.

regards, tom lane




Re: DecodeInterval fixes

2023-07-08 Thread Gurjeet Singh
On Fri, Jul 7, 2023 at 4:13 PM Tom Lane  wrote:
>
> The ECPG datetime datatype support code has been basically unmaintained
> for years, and has diverged quite far at this point from the core code.

I was under the impression that anything in the postgresql.git
repository is considered core, and hence maintained as one unit, from
release to release. An example of this, to me, were all the contrib/*
modules.

> I wouldn't expect that a patch to the core code necessarily applies
> easily to ECPG, nor would I expect somebody patching the core to bother
> trying.

The above statement makes me think that only the code inside
src/backend/ is considered core. Is that the right assumption?

> Perhaps modernizing/resyncing that ECPG code would be a worthwhile
> undertaking, but it'd be a mighty large one, and I'm not sure about
> the size of the return.  In the meantime, benign neglect is the policy.

Benign neglect doesn't sound nice from a user's/consumer's
perspective. Can it be labeled (i.e. declared as such in docs) as
deprecated.

Knowing that the tool you use has now been deprecated would be a
better message for someone still using it, even if it was left marked
deprecated indefinitely. Discovering benign neglect for the tool you
depend on, from secondary sources (like this list, forums, etc.), does
not evoke a lot of confidence.

Best regards,
Gurjeet
http://Gurje.et




Re: Cleaning up array_in()

2023-07-08 Thread Heikki Linnakangas

On 08/07/2023 19:08, Tom Lane wrote:

I wrote:

So I end up with the attached.  I went ahead and dropped
ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
where the new 0002 avoids re-indenting any existing code in order
to ease review, and then 0003 is just a mechanical application
of pgindent.


That got sideswiped by ae6d06f09, so here's a trivial rebase to
pacify the cfbot.

#text/x-diff; name="v3-0001-Simplify-and-speed-up-ReadArrayStr.patch" 
[v3-0001-Simplify-and-speed-up-ReadArrayStr.patch] 
/home/tgl/pgsql/v3-0001-Simplify-and-speed-up-ReadArrayStr.patch
#text/x-diff; 
name="v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch" 
[v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch] 
/home/tgl/pgsql/v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch
#text/x-diff; name="v3-0003-Re-indent-ArrayCount.patch" 
[v3-0003-Re-indent-ArrayCount.patch] /home/tgl/pgsql/v3-0003-Re-indent-ArrayCount.patch


Something's wrong with your attachments.

Hmm, I wonder if ae6d06f09 had a negative performance impact. In an 
unquoted array element, scanner_isspace() function is called for every 
character, so it might be worth inlining.


On the patches: They are a clear improvement, thanks for that. That 
said, I still find the logic very hard to follow, and there are some 
obvious performance optimizations that could be made.


ArrayCount() interprets low-level quoting and escaping, and tracks the 
dimensions at the same time. The state machine is pretty complicated. 
And when you've finally finished reading and grokking that function, you 
see that ReadArrayStr() repeats most of the same logic. Ugh.


I spent some time today refactoring it for readability and speed. I 
introduced a separate helper function to tokenize the input. It deals 
with whitespace, escapes, and backslashes. Then I merged ArrayCount() 
and ReadArrayStr() into one function that parses the elements and 
determines the dimensions in one pass. That speeds up parsing large 
arrays. With the tokenizer function, the logic in ReadArrayStr() is 
still quite readable, even though it's now checking the dimensions at 
the same time.


I also noticed that we used atoi() to parse the integers in the 
dimensions, which doesn't do much error checking. Some funny cases were 
accepted because of that, for example:


postgres=# select '[1+-+-+-+-+-+:2]={foo,bar}'::text[];
   text
---
 {foo,bar}
(1 row)

I tightened that up in the passing.

Attached are your patches, rebased to fix the conflicts with ae6d06f09 
like you intended. On top of that, my patches. My patches need more 
testing, benchmarking, and review, so if we want to sneak something into 
v16, better go with just your patches. If we're tightening up the 
accepted inputs, maybe fix that atoi() sloppiness, though.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 8c036f459c4e95fc3d0edf0d24a4e7c88443e401 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 4 Jul 2023 12:39:41 -0400
Subject: [PATCH v3 1/5] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

ArrayGetOffset0() is no longer used anywhere, so remove it.
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++
 src/backend/utils/adt/arrayutils.c | 15 ---
 src/include/utils/array.h  |  1 -
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 4359dbd83d..967aff8cf9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ typedef struct ArrayIteratorData
 static int	ArrayCount(const char *str, int *dim, char typdelim,
 	   Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems, int ndim, int *dim,
+		 int nitems,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
 	dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
 	nullsPtr = (bool *) palloc(nitems * sizeof(bool));
 	if (!ReadArrayStr(p, string,
-	  nitems, ndim, dim,
+	  nitems,
 	  _extra->proc, typioparam, typmod,
 	  typdelim,
 	  typlen, typbyval, typalign,
@@ -436,7 +436,8 @@ array_in(PG_FUNCTION_ARGS)
 
 /*
  * ArrayCount
- *	 Determines the dimensions for an array string.
+ *	 Determines the dimensions for an array string.  This includes
+ *	 

Re: Cleaning up array_in()

2023-07-08 Thread Tom Lane
I wrote:
> That got sideswiped by ae6d06f09, so here's a trivial rebase to
> pacify the cfbot.

Sigh, this time with patch.

regards, tom lane

From 63ee707f6aa38043c0211c0c4a124fa5fe2ad016 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Sat, 8 Jul 2023 11:55:56 -0400
Subject: [PATCH v3 1/3] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

ArrayGetOffset0() is no longer used anywhere, so remove it.
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++
 src/backend/utils/adt/arrayutils.c | 15 ---
 src/include/utils/array.h  |  1 -
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 4359dbd83d..967aff8cf9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ typedef struct ArrayIteratorData
 static int	ArrayCount(const char *str, int *dim, char typdelim,
 	   Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-		 int nitems, int ndim, int *dim,
+		 int nitems,
 		 FmgrInfo *inputproc, Oid typioparam, int32 typmod,
 		 char typdelim,
 		 int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
 	dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
 	nullsPtr = (bool *) palloc(nitems * sizeof(bool));
 	if (!ReadArrayStr(p, string,
-	  nitems, ndim, dim,
+	  nitems,
 	  _extra->proc, typioparam, typmod,
 	  typdelim,
 	  typlen, typbyval, typalign,
@@ -436,7 +436,8 @@ array_in(PG_FUNCTION_ARGS)
 
 /*
  * ArrayCount
- *	 Determines the dimensions for an array string.
+ *	 Determines the dimensions for an array string.  This includes
+ *	 syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
  * returned in dim[], which must be of size MAXDIM.
@@ -683,16 +684,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 /*
  * ReadArrayStr :
  *	 parses the array string pointed to by "arrayStr" and converts the values
- *	 to internal format.  Unspecified elements are initialized to nulls.
- *	 The array dimensions must already have been determined.
+ *	 to internal format.  The array dimensions must have been determined,
+ *	 and the case of an empty array must have been handled earlier.
  *
  * Inputs:
  *	arrayStr: the string to parse.
  *			  CAUTION: the contents of "arrayStr" will be modified!
  *	origStr: the unmodified input string, used only in error messages.
  *	nitems: total number of array elements, as already determined.
- *	ndim: number of array dimensions
- *	dim[]: array axis lengths
  *	inputproc: type-specific input procedure for element datatype.
  *	typioparam, typmod: auxiliary values to pass to inputproc.
  *	typdelim: the value delimiter (type-specific).
@@ -717,8 +716,6 @@ static bool
 ReadArrayStr(char *arrayStr,
 			 const char *origStr,
 			 int nitems,
-			 int ndim,
-			 int *dim,
 			 FmgrInfo *inputproc,
 			 Oid typioparam,
 			 int32 typmod,
@@ -732,20 +729,13 @@ ReadArrayStr(char *arrayStr,
 			 int32 *nbytes,
 			 Node *escontext)
 {
-	int			i,
+	int			i = 0,
 nest_level = 0;
 	char	   *srcptr;
 	bool		in_quotes = false;
 	bool		eoArray = false;
 	bool		hasnull;
 	int32		totbytes;
-	int			indx[MAXDIM] = {0},
-prod[MAXDIM];
-
-	mda_get_prod(ndim, dim, prod);
-
-	/* Initialize is-null markers to true */
-	memset(nulls, true, nitems * sizeof(bool));
 
 	/*
 	 * We have to remove " and \ characters to create a clean item value to
@@ -768,11 +758,20 @@ ReadArrayStr(char *arrayStr,
 		bool		itemdone = false;
 		bool		leadingspace = true;
 		bool		hasquoting = false;
-		char	   *itemstart;
-		char	   *dstptr;
-		char	   *dstendptr;
+		char	   *itemstart;	/* start of de-escaped text */
+		char	   *dstptr;		/* next output point for de-escaped text */
+		char	   *dstendptr;	/* last significant output char + 1 */
 
-		i = -1;
+		/*
+		 * Parse next array element, collecting the de-escaped text into
+		 * itemstart..dstendptr-1.
+		 *
+		 * Notice that we do not set "itemdone" until we see a separator
+		 * (typdelim character) or the array's final right brace.  Since the
+		 * array is already verified to be nonempty and rectangular, there is
+		 * guaranteed to be another element to be processed in the first case,
+		 * while in the second case of course 

Re: Preventing non-superusers from altering session authorization

2023-07-08 Thread Joseph Koshakow
Nathan Bossart  wrote:

> I see that RESET SESSION AUTHORIZATION
> with a concurrently dropped role will FATAL with your patch but succeed
> without it, which could be part of the reason.

I didn't even realize it, but the change to superuser_arg() in v2 fixed
this issue. The catalog lookup is only done if
userid != AuthenticatedUserId. So RESET SESSION AUTHORIZATION with a
concurrently dropped role will no longer FATAL.

Thanks,
Joe

On Sat, Jul 1, 2023 at 11:33 AM Joseph Koshakow  wrote:

> >> That might be a good change? If the original authenticated role ID no
> >> longer exists then we may want to return an error when trying to set
> >> your session authorization to that role.
> >
> > I was curious why we don't block DROP ROLE if there are active sessions
> for
> > the role or terminate any such sessions as part of the command, and I
> found
> > this discussion from 2016:
> >
> >https://postgr.es/m/flat/56E87CD8.60007%40ohmu.fi
>
> Ah, that makes sense that we don't prevent DROP ROLE on active roles.
> Though, we do error when you try and set your role or session
> authorization to a dropped role. So erroring on RESET SESSION
> AUTHORIZATION when the original role is dropped makes it consistent
> with SET SESSION AUTHORIZATION TO . On the other
> hand it makes it inconsistent with RESET ROLE, which does not error on
> a dropped role.
>
> - Joe Koshakow
>
> On Fri, Jun 23, 2023 at 1:54 PM Nathan Bossart 
> wrote:
>
>> On Thu, Jun 22, 2023 at 06:39:45PM -0400, Joseph Koshakow wrote:
>> > On Wed, Jun 21, 2023 at 11:48 PM Nathan Bossart <
>> nathandboss...@gmail.com>
>> > wrote:
>> >> I see that RESET SESSION AUTHORIZATION
>> >> with a concurrently dropped role will FATAL with your patch but succeed
>> >> without it, which could be part of the reason.
>> >
>> > That might be a good change? If the original authenticated role ID no
>> > longer exists then we may want to return an error when trying to set
>> > your session authorization to that role.
>>
>> I was curious why we don't block DROP ROLE if there are active sessions
>> for
>> the role or terminate any such sessions as part of the command, and I
>> found
>> this discussion from 2016:
>>
>> https://postgr.es/m/flat/56E87CD8.60007%40ohmu.fi
>>
>> --
>> Nathan Bossart
>> Amazon Web Services: https://aws.amazon.com
>>
>


Re: Infinite Interval

2023-07-08 Thread Tom Lane
Ashutosh Bapat  writes:
> Fixed assertion in time_mi_time(). It needed to assert that the result
> is FINITE but it was doing the other way round and that triggered some
> failures in cfbot.

It's still not passing in the cfbot, at least not on any non-Linux
platforms.  I believe the reason is that the patch thinks isinf()
delivers a three-way result, but per POSIX you can only expect
zero or nonzero (ie, finite or not).

regards, tom lane




Re: Check lateral references within PHVs for memoize cache keys

2023-07-08 Thread Tom Lane
Richard Guo  writes:
> Rebase the patch on HEAD as cfbot reminds.

This fix seems like a mess.  The function that is in charge of filling
RelOptInfo.lateral_vars is extract_lateral_references; or at least
that was how it was done up to now.  Can't we compute these additional
references there?  If not, maybe we ought to just merge
extract_lateral_references into create_lateral_join_info, rather than
having the responsibility split.  I also wonder whether this change
isn't creating hidden dependencies on RTE order (which would likely be
bugs), since create_lateral_join_info itself examines the lateral_vars
lists as it walks the rtable.

More generally, it's not clear to me why we should need to look inside
lateral PHVs in the first place.  Wouldn't the lateral PHV itself
serve fine as a cache key?

regards, tom lane




Re: DecodeInterval fixes

2023-07-08 Thread Joseph Koshakow
Jacob Champion  writes:
> Hi Joe, here's a partial review:

Thanks so much for the review!

> I'm new to this code, but I agree that the use of `type` and the
> lookahead are not particularly obvious/intuitive. At the very least,
> they'd need some more explanation in the code. Your boolean flag idea
> sounds reasonable, though.

I've updated the patch with the boolean flag idea. I think it's a
bit cleaner and more readable.

>> There is one more problem I noticed, but didn't fix. We allow multiple
>> "@" to be sprinkled anywhere in the input, even though the docs [0]
>> only allow it to appear at the beginning of the input.
>
> (No particular opinion on this.)

I looked into this a bit. The reason this works is because the date
time lexer filters out all punctuation. That's what allows us to parse
things like `SELECT date 'January 8, 1999';`. It's probably not worth
trying to be smarter about what punctuation we allow where, at least
for now. Maybe in the future we can exclude "@" from the punctuation
that get's filtered out.

> It looks like this patch needs a rebase for the CI, too, but there are
> no conflicts.

The attached patch is rebased against master.

Thanks,
Joe Koshakow
From eee98dd65c3556528803b6ee2cab10e9ece8d871 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sun, 9 Apr 2023 20:37:27 -0400
Subject: [PATCH] Fix interval decode handling of invalid intervals

This patch does three things in the DecodeInterval function:

1) Removes dead code for handling unit type RESERVE. There used to be
a unit called "invalid" that was of type RESERVE. At some point that
unit was removed and there were no more units of type RESERVE.
Therefore, the code for RESERVE unit handling is unreachable.

2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.

3) Error when the user has multiple consecutive units or a unit without
an accompanying value.

[0] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
---
 src/backend/utils/adt/datetime.c   | 25 +++--
 src/test/regress/expected/interval.out | 18 ++
 src/test/regress/sql/interval.sql  |  7 +++
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 5d8d583ddc..b930a67007 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3278,6 +3278,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 {
 	bool		force_negative = false;
 	bool		is_before = false;
+	bool		parsing_unit_val = false;
 	char	   *cp;
 	int			fmask = 0,
 tmask,
@@ -3336,6 +3337,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 	itm_in->tm_usec > 0)
 	itm_in->tm_usec = -itm_in->tm_usec;
 type = DTK_DAY;
+parsing_unit_val = false;
 break;
 
 			case DTK_TZ:
@@ -3373,6 +3375,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 	 * are reading right to left.
 	 */
 	type = DTK_DAY;
+	parsing_unit_val = false;
 	break;
 }
 
@@ -3562,10 +3565,14 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 	default:
 		return DTERR_BAD_FORMAT;
 }
+parsing_unit_val = false;
 break;
 
 			case DTK_STRING:
 			case DTK_SPECIAL:
+/* reject consecutive unhandled units */
+if (parsing_unit_val)
+	return DTERR_BAD_FORMAT;
 type = DecodeUnits(i, field[i], );
 if (type == IGNORE_DTF)
 	continue;
@@ -3575,16 +3582,18 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 {
 	case UNITS:
 		type = uval;
+		parsing_unit_val = true;
 		break;
 
 	case AGO:
-		is_before = true;
-		type = uval;
-		break;
 
-	case RESERV:
-		tmask = (DTK_DATE_M | DTK_TIME_M);
-		*dtype = uval;
+		/*
+		 * 'ago' is only allowed to appear at the end of the
+		 * interval.
+		 */
+		if (i != nf - 1)
+			return DTERR_BAD_FORMAT;
+		is_before = true;
 		break;
 
 	default:
@@ -3605,6 +3614,10 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 	if (fmask == 0)
 		return DTERR_BAD_FORMAT;
 
+	/* reject if unit appeared and was never handled */
+	if (parsing_unit_val)
+		return DTERR_BAD_FORMAT;
+
 	/* finally, AGO negates everything */
 	if (is_before)
 	{
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 28b71d9681..7aba799351 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1787,3 +1787,21 @@ SELECT extract(epoch from interval '10 days');
  864000.00
 (1 row)
 
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+ERROR:  invalid input syntax for type interval: "42 days 2 seconds ago 

Re: psql: Add role's membership options to the \du+ command

2023-07-08 Thread Tom Lane
Pavel Luzanov  writes:
> Please find attached new patch version.
> It implements \drg command and hides duplicates in \du & \dg commands.

I took a quick look through this, and have some minor suggestions:

1. I was thinking in terms of dropping the "Member of" column entirely
in \du and \dg.  It doesn't tell you enough, and the output of those
commands is often too wide already.

2. You have describeRoleGrants() set up to localize "ADMIN", "INHERIT",
and "SET".  Since those are SQL keywords, our usual practice is to not
localize them; this'd simplify the code.

3. Not sure about use of LEFT JOIN in the query.  That will mean we
get a row out even for roles that have no grants, which seems like
clutter.  The LEFT JOINs to r and g are fine, but I suggest changing
the first join to a plain join.

Beyond those nits, I think this is a good approach and we should
adopt it (including in v16).

regards, tom lane




Re: Cleaning up array_in()

2023-07-08 Thread Tom Lane
I wrote:
> So I end up with the attached.  I went ahead and dropped
> ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
> where the new 0002 avoids re-indenting any existing code in order
> to ease review, and then 0003 is just a mechanical application
> of pgindent.

That got sideswiped by ae6d06f09, so here's a trivial rebase to
pacify the cfbot.

regards, tom lane

#text/x-diff; name="v3-0001-Simplify-and-speed-up-ReadArrayStr.patch" 
[v3-0001-Simplify-and-speed-up-ReadArrayStr.patch] 
/home/tgl/pgsql/v3-0001-Simplify-and-speed-up-ReadArrayStr.patch
#text/x-diff; 
name="v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch" 
[v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch] 
/home/tgl/pgsql/v3-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch
#text/x-diff; name="v3-0003-Re-indent-ArrayCount.patch" 
[v3-0003-Re-indent-ArrayCount.patch] 
/home/tgl/pgsql/v3-0003-Re-indent-ArrayCount.patch




Re: pg_basebackup check vs Windows file path limits

2023-07-08 Thread Andrew Dunstan


On 2023-07-08 Sa 09:15, Andrew Dunstan wrote:



On 2023-07-06 Th 12:38, Andrew Dunstan wrote:



On 2023-07-06 Th 09:50, Daniel Gustafsson wrote:

On 5 Jul 2023, at 14:49, Andrew Dunstan  wrote:
On 2023-07-04 Tu 16:54, Daniel Gustafsson wrote:

On 4 Jul 2023, at 20:19, Andrew Dunstan
  wrote:

But sadly we're kinda back where we started. fairywren is failing on REL_16_STABLE. 
Before the changes the failure occurred because the test script was unable to create 
the file with a path > 255. Now that we have a way to create the file the test for 
pg_basebackup to reject files with names > 100 fails, I presume because the server 
can't actually see the file. At this stage I'm thinking the best thing would be to 
skip the test altogether on windows if the path is longer than 255.


That does sound like a fairly large hammer for a nail small enough that we
should be able to fix it, but I don't have any other good ideas off the cuff.

Not sure it's such a big hammer. Here's a patch.

No objections to the patch, LGTM.



Thanks. pushed with a couple of tweaks.





Unfortunately, skipping this has now exposed a further problem in this 
test.



Here's the relevant log extracted from 
, 
starting with the skip mentioned above:



[23:29:21.661](0.002s) ok 98 # skip File path too long
### Stopping node "main" using mode fast
# Running: pg_ctl -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/pgdata
 -m fast stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "main"
Junction created for 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\t_010_pg_basebackup_main_data\\pgdata\\pg_replslot
 <<===>> 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\tmp_test_pjj2\\pg_replslot
### Starting node "main"
# Running: pg_ctl -w -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/pgdata
 -l 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/log/010_pg_basebackup_main.log
 -o --cluster-name=main start
waiting for server to start done
server started
# Postmaster PID for node "main" is 5184
Junction created for C:\\tools\\nmsys64\\tmp\\6zkMt003MF\\tempdir <<===>> 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\tmp_test_pjj2
# Taking pg_basebackup tarbackup2 from node "main"
# Running: pg_basebackup -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/backup/tarbackup2
 -h C:/tools/nmsys64/tmp/63ohSgsh21 -p 54699 --checkpoint fast --no-sync -Ft
WARNING:  aborting backup due to backend exiting before pg_backup_stop was 
called
pg_basebackup: error: could not initiate base backup: ERROR:  could not get junction for 
"./pg_replslot": More data is available.


It's worth pointing out that the path for the replslot junction is almost as 
long as the original path.

Since this test is passing on HEAD which has slightly shorter paths, I'm 
wondering if we should change this:

rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
  or BAIL_OUT "could not move $pgdata/pg_replslot";
dir_symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
  or BAIL_OUT "could not symlink to $pgdata/pg_replslot";

to use the much shorter $sys_tempdir created a few lines below.





Pushed a tested fix along those lines.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: gcc -Wclobbered in PostgresMain

2023-07-08 Thread Tom Lane
Sergey Shinderuk  writes:
> While analyzing -Wclobbered warnings from gcc we found a true one in 
> PostgresMain():
> ...
> These variables must be declared volatile, because they are read after 
> longjmp(). send_ready_for_query declared there is volatile.

Yeah, you're on to something there.

> Without volatile, these variables are kept in registers and restored by 
> longjump(). I think, this is harmless because the error handling code 
> calls disable_all_timeouts() anyway.

Hmm.  So what could happen (if these *aren't* in registers) is that we
might later uselessly call disable_timeout to get rid of timeouts that
are long gone anyway.  While that's not terribly expensive, it's not
great either.  What we ought to be doing is resetting these two flags
after the disable_all_timeouts call.

Having done that, it wouldn't really be necessary to mark these
as volatile.  I kept that marking anyway for consistency with 
send_ready_for_query, but perhaps we shouldn't?

> I also moved firstchar's declaration inside the loop where it's used, to 
> make it clear that this variable needn't be volatile and is not 
> preserved after longjmp().

Good idea, but then why not the same for input_message?  It's fully
reinitialized each time through the loop, too.

In short, something like the attached, except I'm not totally sold
on changing the volatility of the timeout flags.

regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01b6cc1f7d..d18018671d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4111,12 +4111,12 @@ PostgresSingleUserMain(int argc, char *argv[],
 void
 PostgresMain(const char *dbname, const char *username)
 {
-	int			firstchar;
-	StringInfoData input_message;
 	sigjmp_buf	local_sigjmp_buf;
+
+	/* these must be volatile to ensure state is preserved across longjmp: */
 	volatile bool send_ready_for_query = true;
-	bool		idle_in_transaction_timeout_enabled = false;
-	bool		idle_session_timeout_enabled = false;
+	volatile bool idle_in_transaction_timeout_enabled = false;
+	volatile bool idle_session_timeout_enabled = false;
 
 	Assert(dbname != NULL);
 	Assert(username != NULL);
@@ -4324,6 +4324,8 @@ PostgresMain(const char *dbname, const char *username)
 		 */
 		disable_all_timeouts(false);
 		QueryCancelPending = false; /* second to avoid race condition */
+		idle_in_transaction_timeout_enabled = false;
+		idle_session_timeout_enabled = false;
 
 		/* Not reading from the client anymore. */
 		DoingCommandRead = false;
@@ -4418,6 +4420,9 @@ PostgresMain(const char *dbname, const char *username)
 
 	for (;;)
 	{
+		int			firstchar;
+		StringInfoData input_message;
+
 		/*
 		 * At top of loop, reset extended-query-message flag, so that any
 		 * errors encountered in "idle" state don't provoke skip.


Re: pg_basebackup check vs Windows file path limits

2023-07-08 Thread Andrew Dunstan


On 2023-07-06 Th 12:38, Andrew Dunstan wrote:



On 2023-07-06 Th 09:50, Daniel Gustafsson wrote:

On 5 Jul 2023, at 14:49, Andrew Dunstan  wrote:
On 2023-07-04 Tu 16:54, Daniel Gustafsson wrote:

On 4 Jul 2023, at 20:19, Andrew Dunstan
  wrote:

But sadly we're kinda back where we started. fairywren is failing on REL_16_STABLE. 
Before the changes the failure occurred because the test script was unable to create 
the file with a path > 255. Now that we have a way to create the file the test for 
pg_basebackup to reject files with names > 100 fails, I presume because the server 
can't actually see the file. At this stage I'm thinking the best thing would be to 
skip the test altogether on windows if the path is longer than 255.


That does sound like a fairly large hammer for a nail small enough that we
should be able to fix it, but I don't have any other good ideas off the cuff.

Not sure it's such a big hammer. Here's a patch.

No objections to the patch, LGTM.



Thanks. pushed with a couple of tweaks.





Unfortunately, skipping this has now exposed a further problem in this test.


Here's the relevant log extracted from 
, 
starting with the skip mentioned above:



[23:29:21.661](0.002s) ok 98 # skip File path too long
### Stopping node "main" using mode fast
# Running: pg_ctl -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/pgdata
 -m fast stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "main"
Junction created for 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\t_010_pg_basebackup_main_data\\pgdata\\pg_replslot
 <<===>> 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\tmp_test_pjj2\\pg_replslot
### Starting node "main"
# Running: pg_ctl -w -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/pgdata
 -l 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/log/010_pg_basebackup_main.log
 -o --cluster-name=main start
waiting for server to start done
server started
# Postmaster PID for node "main" is 5184
Junction created for C:\\tools\\nmsys64\\tmp\\6zkMt003MF\\tempdir <<===>> 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build\\testrun\\pg_basebackup\\010_pg_basebackup\\data\\tmp_test_pjj2
# Taking pg_basebackup tarbackup2 from node "main"
# Running: pg_basebackup -D 
C:\\tools\\nmsys64\\home\\pgrunner\\bf\\root\\REL_16_STABLE\\pgsql.build/testrun/pg_basebackup/010_pg_basebackup/data/t_010_pg_basebackup_main_data/backup/tarbackup2
 -h C:/tools/nmsys64/tmp/63ohSgsh21 -p 54699 --checkpoint fast --no-sync -Ft
WARNING:  aborting backup due to backend exiting before pg_backup_stop was 
called
pg_basebackup: error: could not initiate base backup: ERROR:  could not get junction for 
"./pg_replslot": More data is available.


It's worth pointing out that the path for the replslot junction is almost as 
long as the original path.

Since this test is passing on HEAD which has slightly shorter paths, I'm 
wondering if we should change this:

   rename("$pgdata/pg_replslot", "$tempdir/pg_replslot")
  or BAIL_OUT "could not move $pgdata/pg_replslot";
   dir_symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot")
  or BAIL_OUT "could not symlink to $pgdata/pg_replslot";

to use the much shorter $sys_tempdir created a few lines below.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-08 Thread Tomas Vondra



On 7/8/23 10:29, Alena Rybakina wrote:
> 
>> Well, one option would be to modify all selectivity functions to do
>> something like the patch does for nulltestsel(). That seems a bit
>> cumbersome because why should those places care about maybe running on
>> the outer side of a join, or what? For code in extensions this would be
>> particularly problematic, I think.
> Agree. I would say that we can try it if nothing else works out.
>> So what I was thinking about doing this in a way that'd make this
>> automatic, without having to modify the selectivity functions.
>>
>> Option (3) is very simple - examine_variable would simply adjust the
>> statistics by tweaking the null_frac field, when looking at variables on
>> the outer side of the join. But it has issues when estimating multiple
>> conditions.
>>
>> Imagine t1 has 1M rows, and we want to estimate
>>
>>    SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id)
>>    WHERE ((t2.a=1) AND (t2.b=1))
>>
>> but only 50% of the t1 rows has a match in t2. Assume each of the t2
>> conditions matches 100% rows in the table. With the correction, this
>> means 50% selectivity for each condition. And if we combine them the
>> usual way, it's 0.5 * 0.5 = 0.25.
>>
>> But we know all the rows in the "matching" part match the condition, so
>> the correct selectivity should be 0.5.
>>
>> In a way, this is just another case of estimation issues due to the
>> assumption of independence.
>> FWIW, I used "AND" in the example for simplicity, but that'd probably be
>> pushed to the baserel level. There'd need to be OR to keep it at the
>> join level, but the overall issue is the same, I think.
>>
>> Also, this entirely ignores extended statistics - I have no idea how we
>> might tweak those in (3).
> 
> I understood the idea - it is very similar to what is implemented in the
> current patch.
> 
> But I don't understand how to do it in the examine_variable function, to
> be honest.
> 

Well, I don't have a detailed plan either. In principle it shouldn't be
that hard, I think - examine_variable is loading the statistics, so it
could apply the same null_frac correction, just like nulltestsel would
do a bit later.

The main question is how to pass the information to examine_variable. It
doesn't get the SpecialJoinInfo (which is what nulltestsel used), so
we'd need to invent something new ... add a new argument?

>> But (4) was suggesting we could improve this essentially by treating the
>> join as two distinct sets of rows
>>
>>   - the inner join result
>>
>>   - rows without match on the outer side
>>
>> For the inner part, we would do estimates as now (using the regular
>> per-column statistics). If we knew the conditions match 100% rows, we'd
>> still get 100% when the conditions are combined.
>>
>> For the second part of the join we know the outer side is just NULLs in
>> all columns, and that'd make the estimation much simpler for most
>> clauses. We'd just need to have "fake" statistics with null_frac=1.0 and
>> that's it.
>>
>> And then we'd just combine these two selectivities. If we know the inner
>> side is 50% and all rows match the conditions, and no rows in the other
>> 50% match, the selectivity is 50%.
>>
>> inner_part * inner_sel + outer_part * outer_sel = 0.5 * 1.0 + 0.0 = 0.5
>>
>> Now, we still have issues with independence assumption in each of these
>> parts separately. But that's OK, I think.
>>
>> I think (4) could be implemented by doing the current estimation for the
>>   inner part, and by tweaking examine_variable in the "outer" part in a
>> way similar to (3). Except that it just sets null_frac=1.0 everywhere.
>>
>> For (4) we don't need to tweak those at all,
>> because for inner part we can just apply them as is, and for outer part
>> it's irrelevant because everything is NULL.
> I like this idea the most) I'll try to start with this and implement the
> patch.

Good to hear.

>> I hope this makes more sense. If not, let me know and I'll try to
>> explain it better.
> 
> Thank you for your explanation)
> 
> I will unsubscribe soon based on the results or if I have any questions.
> 

OK

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Problems with estimating OR conditions, IS NULL on LEFT JOINs

2023-07-08 Thread Alena Rybakina




Well, one option would be to modify all selectivity functions to do
something like the patch does for nulltestsel(). That seems a bit
cumbersome because why should those places care about maybe running on
the outer side of a join, or what? For code in extensions this would be
particularly problematic, I think.

Agree. I would say that we can try it if nothing else works out.

So what I was thinking about doing this in a way that'd make this
automatic, without having to modify the selectivity functions.

Option (3) is very simple - examine_variable would simply adjust the
statistics by tweaking the null_frac field, when looking at variables on
the outer side of the join. But it has issues when estimating multiple
conditions.

Imagine t1 has 1M rows, and we want to estimate

   SELECT * FROM t1 LEFT JOIN t2 ON (t1.id = t2.id)
   WHERE ((t2.a=1) AND (t2.b=1))

but only 50% of the t1 rows has a match in t2. Assume each of the t2
conditions matches 100% rows in the table. With the correction, this
means 50% selectivity for each condition. And if we combine them the
usual way, it's 0.5 * 0.5 = 0.25.

But we know all the rows in the "matching" part match the condition, so
the correct selectivity should be 0.5.

In a way, this is just another case of estimation issues due to the
assumption of independence.
FWIW, I used "AND" in the example for simplicity, but that'd probably be
pushed to the baserel level. There'd need to be OR to keep it at the
join level, but the overall issue is the same, I think.

Also, this entirely ignores extended statistics - I have no idea how we
might tweak those in (3).


I understood the idea - it is very similar to what is implemented in the 
current patch.


But I don't understand how to do it in the examine_variable function, to 
be honest.



But (4) was suggesting we could improve this essentially by treating the
join as two distinct sets of rows

  - the inner join result

  - rows without match on the outer side

For the inner part, we would do estimates as now (using the regular
per-column statistics). If we knew the conditions match 100% rows, we'd
still get 100% when the conditions are combined.

For the second part of the join we know the outer side is just NULLs in
all columns, and that'd make the estimation much simpler for most
clauses. We'd just need to have "fake" statistics with null_frac=1.0 and
that's it.

And then we'd just combine these two selectivities. If we know the inner
side is 50% and all rows match the conditions, and no rows in the other
50% match, the selectivity is 50%.

inner_part * inner_sel + outer_part * outer_sel = 0.5 * 1.0 + 0.0 = 0.5

Now, we still have issues with independence assumption in each of these
parts separately. But that's OK, I think.

I think (4) could be implemented by doing the current estimation for the
  inner part, and by tweaking examine_variable in the "outer" part in a
way similar to (3). Except that it just sets null_frac=1.0 everywhere.

For (4) we don't need to tweak those at all,
because for inner part we can just apply them as is, and for outer part
it's irrelevant because everything is NULL.
I like this idea the most) I'll try to start with this and implement the 
patch.

I hope this makes more sense. If not, let me know and I'll try to
explain it better.


Thank you for your explanation)

I will unsubscribe soon based on the results or if I have any questions.

--
Regards,
Alena Rybakina
Postgres Professional





Re: [PATCH] Add support function for containment operators

2023-07-08 Thread Kim Johan Andersson

On 07-07-2023 13:20, Laurenz Albe wrote:

I wrote:

You implement both "SupportRequestIndexCondition" and "SupportRequestSimplify",
but when I experimented, the former was never called.  That does not
surprise me, since any expression of the shape "expr <@ range constant"
can be simplified.  Is the "SupportRequestIndexCondition" branch dead code?
If not, do you have an example that triggers it?


I would think it is dead code, I came to the same conclusion. So we can 
drop SupportRequestIndexCondition, since the simplification happens to 
take care of everything.




I had an idea about this:
So far, you only consider constant ranges.  But if we have a STABLE range
expression, you could use an index scan for "expr <@ range", for example
Index Cond (expr >= lower(range) AND expr < upper(range)).



I will try to look into this. Originally that was what I was hoping for, 
but didn't see way of going about it.


Thanks for your comments, I will also look at the locale-related 
breakage you spotted.


Regards,
Kimjand